• ES Pipeline Aggregation(管道聚合)


    主要有如下两种管道聚合方式:

    • parent
    • sibling

    下面一一介绍ES定义的管道聚合。

    Avg Bucket 聚合

    同级管道聚合,它计算同级聚合中指定度量的平均值。同级聚合必须是多桶聚合,针对的是度量聚合(metric Aggregation)。 示例如下:

    {
        "avg_bucket": {
            "buckets_path": "the_sum"  // @1
        }
    }
    • buckets_path:指定聚合的名称,支持多级嵌套聚合。 其他参数:
    • gap_policy 当管道聚合遇到不存在的值,有点类似于term等聚合的(missing)时所采取的策略,可选择值为:skip、insert_zeros。
    • skip:此选项将丢失的数据视为bucket不存在。它将跳过桶并使用下一个可用值继续计算。
    • insert_zeros:默认使用0代替。
    • format 用于格式化聚合桶的输出(key)。

    示例如下:

    POST /_search
    {
       "size": 0,
       "aggs": {
       "sales_per_month": {                  // @1
           "date_histogram": {
               "field": "date",
               "interval": "month"
               },
           "aggs": {                                    // @2
               "sales": {
               "sum": {
                  "field": "price"
              }
            }
          }
        },
        "avg_monthly_sales": {             // @3
          "avg_bucket": {
            "buckets_path": "sales_per_month>sales" 
          }
        }
      }
    }

    代码@1:首先定义第一级聚合(按月)直方图聚合。

    代码@2:定义第二级聚合,在按月聚合的基础上,对每个月的文档求sum。

    代码@3:对上面的聚合求平均值。

    其返回结果如下:

     1{
     2    ... // 省略
     3   "aggregations": {
     4      "sales_per_month": {
     5         "buckets": [
     6            {
     7               "key_as_string": "2015/01/01 00:00:00",
     8               "key": 1420070400000,
     9               "doc_count": 3,
    10               "sales": {
    11                  "value": 550.0
    12               }
    13            },
    14            {
    15               "key_as_string": "2015/02/01 00:00:00",
    16               "key": 1422748800000,
    17               "doc_count": 2,
    18               "sales": {
    19                  "value": 60.0
    20               }
    21            }
    22         ]
    23      },
    24      "avg_monthly_sales": {   // 这是对二级聚合的结果再进行一次求平均值聚合。
    25          "value": 328.33333333333333
    26      }
    27   }
    28}

    对应的JAVA示例如下:

     1public static void test_pipeline_avg_buncket_aggregation() {
     2        RestHighLevelClient client = EsClient.getClient();
     3        try {
     4            SearchRequest searchRequest = new SearchRequest();
     5            searchRequest.indices("aggregations_index02");
     6            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
     7            AggregationBuilder aggregationBuild = AggregationBuilders.terms("seller_agg")
     8                                                        .field("sellerId")
     9                                                        .subAggregation(AggregationBuilders.sum("seller_num_agg")
    10                                                                            .field("num")
    11                                                        )
    12                                                  ;
    13            sourceBuilder.aggregation(aggregationBuild);
    14
    15            // 添加 avg bucket pipeline
    16            sourceBuilder.aggregation(new AvgBucketPipelineAggregationBuilder("seller_num_agg_av", "seller_agg>seller_num_agg"));
    17            sourceBuilder.size(0);
    18
    19            searchRequest.source(sourceBuilder);
    20            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
    21            System.out.println(result);
    22        } catch (Throwable e) {
    23            e.printStackTrace();
    24        } finally {
    25            EsClient.close(client);
    26        }
    27    }

    Percentiles Bucket 聚合

    同级管道聚合,百分位管道聚合。其JAVA示例如下:

     1public static void test_Percentiles_buncket_aggregation() {
     2        RestHighLevelClient client = EsClient.getClient();
     3        try {
     4            SearchRequest searchRequest = new SearchRequest();
     5            searchRequest.indices("aggregations_index02");
     6            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
     7            AggregationBuilder aggregationBuild = AggregationBuilders.terms("seller_agg")
     8                                                        .field("sellerId")
     9                                                        .subAggregation(AggregationBuilders.sum("seller_num_agg")
    10                                                                            .field("num")
    11                                                        )
    12                                                  ;
    13            sourceBuilder.aggregation(aggregationBuild);
    14
    15            // 添加 avg bucket pipeline
    16            sourceBuilder.aggregation(new PercentilesBucketPipelineAggregationBuilder("seller_num_agg_av", "seller_agg>seller_num_agg"));
    17            sourceBuilder.size(0);
    18
    19            searchRequest.source(sourceBuilder);
    20            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
    21            System.out.println(result);
    22        } catch (Throwable e) {
    23            e.printStackTrace();
    24        } finally {
    25            EsClient.close(client);
    26        }
    27    }

    其返回值如下:

    1{
     2  ...  // 省略其他属性
     3    "aggregations":{
     4        "lterms#seller_agg":{
     5            "doc_count_error_upper_bound":0,
     6            "sum_other_doc_count":12,
     7            "buckets":[
     8                {
     9                    "key":45,
    10                    "doc_count":567,
    11                    "sum#seller_num_agg":{
    12                        "value":911
    13                    }
    14                },
    15                {
    16                    "key":31,
    17                    "doc_count":324,
    18                    "sum#seller_num_agg":{
    19                        "value":353
    20                    }
    21                } // 省略其他桶的显示
    22            ]
    23        },
    24        "percentiles_bucket#seller_num_agg_av":{
    25            "values":{
    26                "1.0":5,
    27                "5.0":5,
    28                "25.0":10,
    29                "50.0":20,
    30                "75.0":290,
    31                "95.0":911,
    32                "99.0":911
    33            }
    34        }
    35    }
    36}

    Cumulative Sum 聚合

    累积管道聚合,就是就是依次将每个管道的sum聚合进行累加。

    其语法(restfull)如下:

    1{
    2    "cumulative_sum": {
    3        "buckets_path": "the_sum"
    4    }
    5}

    支持的参数说明:

    • buckets_path 桶聚合名称,作为管道聚合的输入信息。
    • format 格式化key。

    使用示例如下:

     1POST /sales/_search
     2{
     3    "size": 0,
     4    "aggs" : {
     5        "sales_per_month" : {
     6            "date_histogram" : {
     7                "field" : "date",
     8                "interval" : "month"
     9            },
    10            "aggs": {
    11                "sales": {
    12                    "sum": {
    13                        "field": "price"
    14                    }
    15                },
    16                "cumulative_sales": {
    17                    "cumulative_sum": {
    18                        "buckets_path": "sales" 
    19                    }
    20                }
    21            }
    22        }
    23    }
    24}

    其返回结果如下:

     1{
     2   "took": 11,
     3   "timed_out": false,
     4   "_shards": ...,
     5   "hits": ...,
     6   "aggregations": {
     7      "sales_per_month": {
     8         "buckets": [
     9            {
    10               "key_as_string": "2015/01/01 00:00:00",
    11               "key": 1420070400000,
    12               "doc_count": 3,
    13               "sales": {
    14                  "value": 550.0
    15               },
    16               "cumulative_sales": {
    17                  "value": 550.0
    18               }
    19            },
    20            {
    21               "key_as_string": "2015/02/01 00:00:00",
    22               "key": 1422748800000,
    23               "doc_count": 2,
    24               "sales": {
    25                  "value": 60.0
    26               },
    27               "cumulative_sales": {
    28                  "value": 610.0
    29               }
    30            },
    31            {
    32               "key_as_string": "2015/03/01 00:00:00",
    33               "key": 1425168000000,
    34               "doc_count": 2,
    35               "sales": {
    36                  "value": 375.0
    37               },
    38               "cumulative_sales": {
    39                  "value": 985.0
    40               }
    41            }
    42         ]
    43      }
    44   }
    45}

    从结果可知,cumulative_sales的值等于上一个cumulative_sales + 当前桶的sum聚合。

    对应的JAVA示例如下:

    1{
     2    "aggregations":{
     3        "date_histogram#createTime_histogram":{
     4            "buckets":{
     5                "2015-12-01 00:00:00":{
     6                    "key_as_string":"2015-12-01 00:00:00",
     7                    "key":1448928000000,
     8                    "doc_count":6,
     9                    "sum#seller_num_agg":{
    10                        "value":16
    11                    },
    12                    "simple_value#Cumulative_Seller_num_agg":{
    13                        "value":16
    14                    }
    15                },
    16                "2016-01-01 00:00:00":{
    17                    "key_as_string":"2016-03-01 00:00:00",
    18                    "key":1456790400000,
    19                    "doc_count":10,
    20                    "sum#seller_num_agg":{
    21                        "value":11
    22                    },
    23                    "simple_value#Cumulative_Seller_num_agg":{
    24                        "value":31
    25                    }
    26                }
    27                // ... 忽略
    28            }
    29        }
    30    }
    31}

    Bucket Sort 聚合

    一种父管道聚合,它对其父多桶聚合的桶进行排序。并可以指定多个排序字段。每个bucket可以根据它的_key、_count或子聚合进行排序。此外,可以设置from和size的参数,以便截断结果桶。

    使用语法如下:

     1{
     2    "bucket_sort": {
     3        "sort": [
     4            {"sort_field_1": {"order": "asc"}},
     5            {"sort_field_2": {"order": "desc"}},
     6            "sort_field_3"
     7        ],
     8        "from": 1,
     9        "size": 3
    10    }
    11}

    支持的参数说明如下:

    • sort 定义排序结构。
    • from 用与对父聚合的桶进行截取,该值之前的所有桶将忽略,也就是不参与排序,默认为0。
    • size 返回的桶数。默认为父聚合的所有桶。
    • gap_policy 当管道聚合遇到不存在的值,有点类似于term等聚合的(missing)时所采取的策略,可选择值为:skip、insert_zeros。
    • skip:此选项将丢失的数据视为bucket不存在。它将跳过桶并使用下一个可用值继续计算。
    • insert_zeros:默认使用0代替。

    官方示例如下:

     1POST /sales/_search
     2{
     3    "size": 0,
     4    "aggs" : {
     5        "sales_per_month" : {
     6            "date_histogram" : {
     7                "field" : "date",
     8                "interval" : "month"
     9            },
    10            "aggs": {
    11                "total_sales": {
    12                    "sum": {
    13                        "field": "price"
    14                    }
    15                },
    16                "sales_bucket_sort": {
    17                    "bucket_sort": {
    18                        "sort": [
    19                          {"total_sales": {"order": "desc"}}
    20                        ],
    21                        "size": 3
    22                    }
    23                }
    24            }
    25        }
    26    }
    27}

    对应的JAVA示例如下:

     1public static void test_bucket_sort_Aggregation() {
     2        RestHighLevelClient client = EsClient.getClient();
     3        try {
     4
     5            //构建日期直方图聚合  时间间隔,示例中按月统计
     6            DateHistogramInterval interval = new DateHistogramInterval("1M"); 
     7            SearchRequest searchRequest = new SearchRequest();
     8            searchRequest.indices("aggregations_index02");
     9            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    10            AggregationBuilder aggregationBuild = AggregationBuilders.dateHistogram("createTime_histogram")
    11                                                                        .field("createTime")
    12                                                                        .dateHistogramInterval(interval)
    13                                                                        .keyed(true)
    14                                                                        .subAggregation(AggregationBuilders.sum("seller_num_agg")
    15                                                                                .field("num")
    16                                                                        )
    17                                                                        .subAggregation(new  BucketSortPipelineAggregationBuilder("seller_num_agg_sort", Arrays.asList(
    18                                                                                new FieldSortBuilder("seller_num_agg").order(SortOrder.ASC)))
    19                                                                                .from(0)
    20                                                                                .size(3))
    21                                                                        //  BucketSortPipelineAggregationBuilder(String name, List<FieldSortBuilder> sorts)
    22                                                                        .subAggregation(new CumulativeSumPipelineAggregationBuilder("Cumulative_Seller_num_agg", "seller_num_agg"))
    23                                                                    //  .format("yyyy-MM-dd") // 对key的格式化
    24                                                  ;
    25            sourceBuilder.aggregation(aggregationBuild);
    26            sourceBuilder.size(0);
    27            sourceBuilder.query(
    28                    QueryBuilders.termQuery("sellerId", 24)
    29            );
    30            searchRequest.source(sourceBuilder);
    31            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
    32            System.out.println(result);
    33        } catch (Throwable e) {
    34            e.printStackTrace();
    35        } finally {
    36            EsClient.close(client);
    37        }
    38    }

    返回值:

     1{
     2    "aggregations":{
     3        "date_histogram#createTime_histogram":{
     4            "buckets":{
     5                "2016-04-01 00:00:00":{
     6                    "key_as_string":"2016-04-01 00:00:00",
     7                    "key":1459468800000,
     8                    "doc_count":2,
     9                    "sum#seller_num_agg":{
    10                        "value":2
    11                    },
    12                    "simple_value#Cumulative_Seller_num_agg":{
    13                        "value":2
    14                    }
    15                },
    16                "2017-05-01 00:00:00":{
    17                    "key_as_string":"2017-05-01 00:00:00",
    18                    "key":1493596800000,
    19                    "doc_count":3,
    20                    "sum#seller_num_agg":{
    21                        "value":3
    22                    },
    23                    "simple_value#Cumulative_Seller_num_agg":{
    24                        "value":5
    25                    }
    26                },
    27                "2017-02-01 00:00:00":{
    28                    "key_as_string":"2017-02-01 00:00:00",
    29                    "key":1485907200000,
    30                    "doc_count":4,
    31                    "sum#seller_num_agg":{
    32                        "value":4
    33                    },
    34                    "simple_value#Cumulative_Seller_num_agg":{
    35                        "value":9
    36                    }
    37                }
    38            }
    39        }
    40    }

    Max Bucket 聚合

    与 avg类似。

    Min Bucket 聚合

    与 avg类似。

    Sum Bucket 聚合

    与 avg类似。

    Stats Bucket 聚合

    与 avg类似。

    转载自:https://cloud.tencent.com/developer/article/1443586

  • 相关阅读:
    The network connection was lost 文件下载错误提示
    ThinkPHP第一课 环境搭建
    Docs-.NET-C#-指南-语言参考-关键字-值类型:内置数值转换
    Docs-.NET-C#-指南-语言参考-关键字-值类型-:浮点数值类型
    Docs-.NET-C#-指南-语言参考-关键字-内置类型-值类型:整型数值类型
    Docs-.NET-C#-指南-语言参考-关键字-内置类型-值类型:值类型的功能
    Docs-.NET-C#-指南-语言参考-关键字:C# 关键字
    Docs-.NET-C#-指南-语言参考-预处理器指令:#pragma checksum(C# 参考)
    Docs-.NET-C#-指南-语言参考-预处理器指令:#pragma warning(C# 参考)
    Docs-.NET-C#-指南-语言参考-预处理器指令:#pragma(C# 参考)
  • 原文地址:https://www.cnblogs.com/fat-girl-spring/p/14241442.html
Copyright © 2020-2023  润新知