• ES度量聚合(ElasticSearch Metric Aggregations)总结


    Metric聚合,主要针对数值类型的字段,类似于关系型数据库中的sum、avg、max、min等聚合类型。
    一、avg 平均值

    对字段grade取平均值。对应的java示例如下:

        @Resource
        private RestHighLevelClient client ;
    
        @Test
        public void testMatchQuery() {
            try {
                SearchRequest searchRequest = new SearchRequest();
                searchRequest.indices("items");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                AggregationBuilder avg = AggregationBuilders.avg("avg-price").field("price").missing(0);
                sourceBuilder.aggregation(avg);
                sourceBuilder.size(0);
                sourceBuilder.query(
                        QueryBuilders.termQuery("category", "一级")
                );
                searchRequest.source(sourceBuilder);
                SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(result);
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                try {
                    client.close();
                }catch (Exception e){
                    log.error(e.getMessage());
                }
            }
        }
    
        

    其中代码missing(0)表示如果文档中没有取平均值的字段时,则使用该值进行计算,本例中使用0参与计算。
    其返回结果如下:

    {
        "aggregations": {
            "asMap": {
                "avg-price": {
                    "fragment": true,
                    "name": "avg-price",
                    "type": "avg",
                    "value": 484.9945,
                    "valueAsString": "484.9945"
                }
            },
            "fragment": true
        },
        "clusters": {
            "fragment": true,
            "skipped": 0,
            "successful": 0,
            "total": 0
        },
        "failedShards": 0,
        "fragment": false,
        "hits": {
            "fragment": true,
            "hits": [],
            "maxScore": 0,
            "totalHits": 2
        },
        "numReducePhases": 1,
        "profileResults": {},
        "shardFailures": [],
        "skippedShards": 0,
        "successfulShards": 5,
        "timedOut": false,
        "took": {
            "days": 0,
            "daysFrac": 2.3148148148148148e-8,
            "hours": 0,
            "hoursFrac": 5.555555555555555e-7,
            "micros": 2000,
            "microsFrac": 2000,
            "millis": 2,
            "millisFrac": 2,
            "minutes": 0,
            "minutesFrac": 0.000033333333333333335,
            "nanos": 2000000,
            "seconds": 0,
            "secondsFrac": 0.002,
            "stringRep": "2ms"
        },
        "totalShards": 5
    }

    二、Weighted Avg Aggregation 加权平均聚合
    加权平均算法,∑(value * weight) / ∑(weight)。
    加权平均(weghted_avg)支持的参数列表:

    • value:提供值的字段或脚本的配置。例如定义计算哪个字段的平均值,该值支持如下子参数:
    • field:用来定义平均值的字段名称。
    • missing:用来定义如果匹配到的文档没有avg字段,使用该值来参与计算。
    • weight:用来定义权重的对象,其可选属性如下:
    • field:定义权重来源的字段。
    • missing:如果文档缺失权重来源字段,以该值来代表该文档的权重值。
    • format:数值类型格式化。
    • value_type:用来指定value的类型,例如ValueType.DATE、ValueType.IP等。

    从文档中抽取属性为weight的字段的值来当权重值。其JAVA示例如下:

        @Test
        public void test_weight_avg_aggregation() {
            try {
                SearchRequest searchRequest = new SearchRequest();
                searchRequest.indices("items");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                WeightedAvgAggregationBuilder avg = AggregationBuilders.weightedAvg("avg-aggregation")
                        .value(
                                (new MultiValuesSourceFieldConfig.Builder())
                                        .setFieldName("price")
                                        .build()
                        )
                        .weight(
                                (new MultiValuesSourceFieldConfig.Builder())
                                        .setFieldName("price")
                                        .build()
                        );
                sourceBuilder.aggregation(avg);
                sourceBuilder.size(0);
                sourceBuilder.query(
                        QueryBuilders.termQuery("category", "一级")
                );
                searchRequest.source(sourceBuilder);
                SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(JSONObject.toJSONString(result));
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                try {
                    client.close();
                }catch (Exception e){
                    log.error(e.getMessage());
                }
            }
        }

    三、Cardinality Aggregation
    基数聚合,先distinct,再聚合,类似关系型数据库(count(distinct))。
    示例如下:

        @Test
        public void test_Cardinality_Aggregation() {
            try {
                SearchRequest searchRequest = new SearchRequest();
                searchRequest.indices("poems");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                AggregationBuilder aggregationBuild = AggregationBuilders.cardinality("author_count").field("author");
                sourceBuilder.aggregation(aggregationBuild);
                sourceBuilder.size(0);
                sourceBuilder.query(
                        QueryBuilders.termQuery("dynasty", "唐")
                );
                searchRequest.source(sourceBuilder);
                SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(JSONObject.toJSONString(result));
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                try {
                    client.close();
                }catch (Exception e){
                    log.error(e.getMessage());
                }
            }
        }

    上述实现与SQL:SELECT COUNT(DISTINCT author) from es_order_tmp where dynasty = "唐"; 效果类似。
    其核心参数如下:

    • precision_threshold:精确度控制。在此计数之下,期望计数接近准确。在这个值之上,计数可能会变得更加模糊(不准确)。支持的最大值是40000,超过此值的阈值与40000的阈值具有相同的效果。默认值是3000。

    上述示例中返回的11是精确值,如果改写成下面的代码,结果将变的不准确:

    {
        "aggregations": {
            "asMap": {
                "author_count": {
                    "fragment": true,
                    "name": "author_count",
                    "type": "cardinality",
                    "value": 6,
                    "valueAsString": "6.0"
                }
            },
            "fragment": true
        },
        "clusters": {
            "fragment": true,
            "skipped": 0,
            "successful": 0,
            "total": 0
        },
        "failedShards": 0,
        "fragment": false,
        "hits": {
            "fragment": true,
            "hits": [],
            "maxScore": 0,
            "totalHits": 15
        },
        "numReducePhases": 1,
        "profileResults": {},
        "shardFailures": [],
        "skippedShards": 0,
        "successfulShards": 5,
        "timedOut": false,
        "took": {
            "days": 0,
            "daysFrac": 4.2824074074074075e-7,
            "hours": 0,
            "hoursFrac": 0.000010277777777777777,
            "micros": 37000,
            "microsFrac": 37000,
            "millis": 37,
            "millisFrac": 37,
            "minutes": 0,
            "minutesFrac": 0.0006166666666666666,
            "nanos": 37000000,
            "seconds": 0,
            "secondsFrac": 0.037,
            "stringRep": "37ms"
        },
        "totalShards": 5
    }

    其返回结果如下:

    {
        "aggregations": {
            "asMap": {
                "author_count": {
                    "fragment": true,
                    "name": "author_count",
                    "type": "cardinality",
                    "value": 12,
                    "valueAsString": "12.0"
                }
            },
            "fragment": true
        },
        "clusters": {
            "fragment": true,
            "skipped": 0,
            "successful": 0,
            "total": 0
        },
        "failedShards": 0,
        "fragment": false,
        "hits": {
            "fragment": true,
            "hits": [],
            "maxScore": 0,
            "totalHits": 22
        },
        "numReducePhases": 1,
        "profileResults": {},
        "shardFailures": [],
        "skippedShards": 0,
        "successfulShards": 5,
        "timedOut": false,
        "took": {
            "days": 0,
            "daysFrac": 2.5462962962962963e-7,
            "hours": 0,
            "hoursFrac": 0.000006111111111111111,
            "micros": 22000,
            "microsFrac": 22000,
            "millis": 22,
            "millisFrac": 22,
            "minutes": 0,
            "minutesFrac": 0.00036666666666666667,
            "nanos": 22000000,
            "seconds": 0,
            "secondsFrac": 0.022,
            "stringRep": "22ms"
        },
        "totalShards": 5
    }
    • Pre-computed hashes:一个比较好的实践是需要对字符串类型的字段进行基数聚合的话,可以提前索引该字符串的hash值,通过对hash值的聚合,提高效率。
    • Missing Value:missing参数定义了应该如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为具有一个值,通过missing value来设置。

    四:Extended Stats Aggregation
    stats聚合的扩展版本,示例如下:

        @Test
        public void test_Extended_Stats_Aggregation() {
            try {
                SearchRequest searchRequest = new SearchRequest();
                searchRequest.indices("items");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                AggregationBuilder aggregationBuild = AggregationBuilders.extendedStats("extended_stats").field("price");
                sourceBuilder.aggregation(aggregationBuild);
                sourceBuilder.size(0);
    //            sourceBuilder.query(
    //                    QueryBuilders.termQuery("sellerId", 24)
    //            );
                searchRequest.source(sourceBuilder);
                SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(JSONObject.toJSONString(result));
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                try {
                    client.close();
                }catch (Exception e){
                    log.error(e.getMessage());
                }
            }
        }

    返回的结果如下:

    {
        "aggregations": {
            "asMap": {
                "extended_stats": {
                    "avg": 281.94725,
                    "avgAsString": "281.94725",
                    "count": 4,
                    "fragment": true,
                    "max": 880.999,
                    "maxAsString": "880.999",
                    "min": 10.9,
                    "minAsString": "10.9",
                    "name": "extended_stats",
                    "stdDeviation": 349.2133556190077,
                    "stdDeviationAsString": "349.2133556190077",
                    "sum": 1127.789,
                    "sumAsString": "1127.789",
                    "sumOfSquares": 805776.8781010001,
                    "sumOfSquaresAsString": "805776.8781010001",
                    "type": "extended_stats",
                    "variance": 121949.96774268753,
                    "varianceAsString": "121949.96774268753"
                }
            },
            "fragment": true
        },
        "clusters": {
            "fragment": true,
            "skipped": 0,
            "successful": 0,
            "total": 0
        },
        "failedShards": 0,
        "fragment": false,
        "hits": {
            "fragment": true,
            "hits": [],
            "maxScore": 0,
            "totalHits": 4
        },
        "numReducePhases": 1,
        "profileResults": {},
        "shardFailures": [],
        "skippedShards": 0,
        "successfulShards": 5,
        "timedOut": false,
        "took": {
            "days": 0,
            "daysFrac": 3.8194444444444445e-7,
            "hours": 0,
            "hoursFrac": 0.000009166666666666666,
            "micros": 33000,
            "microsFrac": 33000,
            "millis": 33,
            "millisFrac": 33,
            "minutes": 0,
            "minutesFrac": 0.00055,
            "nanos": 33000000,
            "seconds": 0,
            "secondsFrac": 0.033,
            "stringRep": "33ms"
        },
        "totalShards": 5
    }

    五、max Aggregation
    求最大值,与avg Aggregation聚合类似,不再重复介绍。
    六、min Aggregation
    求最小值,与avg Aggregation聚合类似,不再重复介绍。
    七、Percentiles Aggregation
    百分位计算,ES提供的另外一种近似度量方式。主要用于展现以具体百分比下观察到的数值,例如,第95个百分位上的数值,是高于 95% 的数据总和。百分位聚合通常用来找出异常,适用与使用统计学中正态分布来观察问题。
    官方文档:https://www.elastic.co/guide/cn/elasticsearch/guide/current/percentiles.html

    八、HDR Histogram(直方图)
    HDR直方图(High Dynamic Range Histogram,高动态范围直方图)是一种替代实现,在计算延迟度量的百分位数时非常有用,因为它比t-digest实现更快,但需要更大的内存占用。此实现维护一个固定的最坏情况百分比错误(指定为有效数字的数量)。这意味着如果数据记录值从1微秒到1小时(3600000000毫秒)直方图设置为3位有效数字,它将维持一个价值1微秒的分辨率值1毫秒,3.6秒(或更好的)最大跟踪值(1小时)。

    1. hdr:通过hdr属性指定直方图相关的参数。
    2. number_of_significant_value_digits:指定以有效位数为单位的直方图值的分辨率。

    注意:hdr直方图只支持正值,如果传递负值,则会出错。如果值的范围是未知的,那么使用HDRHistogram也不是一个好主意,因为这可能会导致内存的大量使用。
    Missing value

    • missing参数定义了应该如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为具有一个值。
        @Test
        public void test_Percentiles_Aggregation() {
            try {
                SearchRequest searchRequest = new SearchRequest();
                searchRequest.indices("items");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                AggregationBuilder aggregationBuild = AggregationBuilders.percentiles("percentiles")
                        .field("price")
                        .percentiles(75,90,99.9)
                        .compression(100)
                        .method(PercentilesMethod.HDR)
                        .numberOfSignificantValueDigits(3)
                        ;
                sourceBuilder.aggregation(aggregationBuild);
                sourceBuilder.size(0);
    //            sourceBuilder.query(
    //                    QueryBuilders.termQuery("sellerId", 24)
    //            );
                searchRequest.source(sourceBuilder);
                SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(JSONObject.toJSONString(result));
            } catch (Throwable e) {
                e.printStackTrace();
            } finally {
                try {
                    client.close();
                }catch (Exception e){
                    log.error(e.getMessage());
                }
            }
        }

    参考博客:https://blog.csdn.net/prestigeding/article/details/88373092

    郭慕荣博客园
  • 相关阅读:
    业务安全相关安全产品的反思
    离线安装selenium
    pytorch在cpu和gpu运算的性能差别
    Qt音视频开发33ffmpeg安卓版
    python怎么安装selenium库?如何使用?
    .NET 中 GC 的模式与风格
    .NET性能优化使用结构体替代类
    Pytorch快速入门及在线体验
    语音识别开源工具PyTorchKaldi:兼顾Kaldi效率与PyTorch灵活性
    如何让setTimeout方法间隔时间更为精确
  • 原文地址:https://www.cnblogs.com/jelly12345/p/15176200.html
Copyright © 2020-2023  润新知