感谢博主的贡献: https://juejin.im/post/6844904032398475278#heading-1
聚合基础:
https://juejin.im/post/6844904032398475278#heading-1
聚合深入理解:
Elasticsearch:aggregation介绍
Elasticsearch:pipeline aggregation 介绍
Elasticsearch:透彻理解Elasticsearch中的Bucket aggregation
查找不同的年龄段:
GET twitter/_search
{ "size": 0, "age": { "range": { "field": "age", "ranges": [{ "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] } } }
使用range类型的聚合
在上面我们定义了不同的年龄段。通过上面的查询,我们可以得到不同年龄段的bucket。显示的结果如下,符合条件的文档在 hits.hits列表中以一个个的字典存在:
{ "took": 4, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "age": { "buckets": [{ "key": "20.0-30.0", "from": 20.0, "to": 30.0, "doc_count": 0 }, { "key": "30.0-40.0", "from": 30.0, "to": 40.0, "doc_count": 3 }, { "key": "40.0-50.0", "from": 40.0, "to": 50.0, "doc_count": 0 } ] } } }
统计关键字出现的频率:
内置关键字 aggs,terms, field, keyword
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"number_of_cities":{"terms":{"field":"city.keyword"}}}, "size":0}'
{ "aggs": { "number_of_cities": { "terms": { "field": "city.keyword" } } }, "size": 0 }
得到
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 71150, "max_score": 0.0, "hits": [] }, "aggregations": { "number_of_cities": { "doc_count_error_upper_bound": 116, "sum_other_doc_count": 16983, "buckets": [{ "key": "合肥", "doc_count": 30017 }, { "key": "", "doc_count": 16761 }, { "key": "columbia", "doc_count": 1546 } ] } } }
统计城市出现的个数:
到底有多少个城市,内置关键字 cardinality
XGET _search { "size": 0, "aggs": { "number_of_cities": { "cardinality": { "field": "city.keyword" } } } }
{ "size": 0, "aggs": { "number_of_cities": { "cardinality": { "field": "city.keyword" } } } }
统计用户平均年龄:
内置函数 avg
GET twitter/_search { "size": 0, "aggs": { "average_age": { "avg": { "field": "age" } } } }
统计平均分 avg,最大分 max,最小分 min,总和 sum
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"avg":{"field":"os_score"}}}, "size":0}'
{ "aggs": { "average_score": { "avg": { "field": "os_score" } } }, "size": 0 }
通过script的方法来对我们的aggregtion结果进行重新计算:
最大分的基础上乘以 0.8 用 *, 除以 2 用 / , 加上一个数 用 +, 减去一个数用 - ,
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"max":{"field":"os_score", "script":{"source":"_value * params.correction", "params":{"correction": 0.8}}}}}, "size":0}'
{ "size": 0, "aggs": { "average_score": { "max": { "field": "os_score", "script": { "source": "_value * params.correction", "params": { "correction": 0.8 } } } } } }
不用 field, 直接使用 script 聚合:
与上述效果等价,尝试未成功
GET twitter/_search
{ "size": 0, "aggs": { "average_2_times_os_score": { "avg": { "script": { "source": "doc['os_score'].value * params.times", "params": { "times": 2.0 } } } } } }
Percentile aggregation
百分位数聚合,如下语句可查出 os_score 的离群值,得到了 25, 50, 75, 100 的分数占比
{ "size": 0, "aggs": { "os_score_quartiles": { "percentiles": { "field": "os_score", "percents": [ 25, 50, 75, 100 ] } } } }
查找结果如下,可以看到
25% 的分数为 90 分以下
50% 的分数在 92 分以下
75% 的分数在 100 分以下
最高分为 100 分
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 71150, "max_score": 0.0, "hits": [] }, "aggregations": { "os_score_qualities": { "values": { "25.0": 90.0, "50.0": 92.0, "75.0": 100.0, "100.0": 100.0 } } } }
analyzer
实现秒级的搜索速度的原因之一:文档被存储时加了索引
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_analyze?pretty' -d '{"text":["我是一个兵"], "analyzer":"standard"}'
{ "text": ["我是一个兵"], "analyzer": "standard" }
结果如下,五个token
{ "tokens": [{ "token": "我", "start_offset": 0, "end_offset": 1, "type": "<IDEOGRAPHIC>", "position": 0 }, { "token": "是", "start_offset": 1, "end_offset": 2, "type": "<IDEOGRAPHIC>", "position": 1 }, { "token": "一", "start_offset": 2, "end_offset": 3, "type": "<IDEOGRAPHIC>", "position": 2 }, { "token": "个", "start_offset": 3, "end_offset": 4, "type": "<IDEOGRAPHIC>", "position": 3 }, { "token": "兵", "start_offset": 4, "end_offset": 5, "type": "<IDEOGRAPHIC>", "position": 4 } ] }