最近在业务中需要使用ES来进行数据查询,在某些场景下需要对数据进行去重,以及去重后的统计分组等操作。现从MySQL角度,来理解ES查询语句。
1、distinct(去重)
1 SELECT DISTINCT(user_id) FROM table WHERE user_id_type = 3;
ES查询:
{ "query": { "term": { "user_id_type": 3 } }, "collapse": { "field": "user_id" } } 结果: { ... "hits": { "hits": [ { "_index": "es_qd_mkt_visitor_packet_dev_v1_20180621", "_type": "ad_crowd", "_source": { "user_id": "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ", "user_id_type": 3 }, "fields": { "user_id": [ "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ" ] } } ] } }
总结:使用collapse字段后,查询结果中[hits]中会出现[fields]字段,其中包含了去重后的user_id
2、count + distinct
1 SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 3;
ES查询:
1 { 2 "query": { 3 "term": { 4 "user_id_type": 3 5 } 6 }, 7 "aggs": { 8 "count": { 9 "cardinality": { 10 "field": "user_id" 11 } 12 } 13 } 14 } 15 16 结果 17 18 { 19 ... 20 "hits": { 21 ... 22 }, 23 "aggregations": { 24 "count": { 25 "value": 121 26 } 27 } 28 }
总结:aggs中cardinality的字段代表需要distinct的字段
3、count + group by
1 SELECT COUNT(user_id) FROM table GROUP BY user_id_type;
ES查询:
1 { 2 "aggs": { 3 "user_type": { 4 "terms": { 5 "field": "user_id_type" 6 } 7 } 8 } 9 } 10 11 结果: 12 13 { 14 ... 15 "hits": { 16 ... 17 }, 18 "aggregations": { 19 "user_type": { 20 ... 21 "buckets": [ 22 { 23 "key": 4, 24 "doc_count": 1220 25 }, 26 { 27 "key": 3, 28 "doc_count": 488 29 } 30 ] 31 } 32 } 33 }
总结:aggs中terms的字段代表需要gruop by的字段
4、count + distinct + group by
1 SELECT COUNT(DISTINCT(user_id)) FROM table GROUP BY user_id_type;
ES查询:
1 { 2 "aggs": { 3 "user_type": { 4 "terms": { 5 "field": "user_id_type" 6 }, 7 "aggs": { 8 "count": { 9 "cardinality": { 10 "field": "user_id" 11 } 12 } 13 } 14 } 15 } 16 } 17 18 结果 19 20 { 21 ... 22 "hits": { 23 ... 24 }, 25 "aggregations": { 26 "user_type": { 27 ... 28 "buckets": [ 29 { 30 "key": 4, 31 "doc_count": 1220, //去重前数据1220条 32 "count": { 33 "value": 276 //去重后数据276条 34 } 35 }, 36 { 37 "key": 3, 38 "doc_count": 488, //去重前数据488条 39 "count": { 40 "value": 121 //去重后数据121条 41 } 42 } 43 ] 44 } 45 } 46 }
总结:对于既有group by又有distinct的查询要求,需要在aggs中嵌套子aggs
最后,需要注意的是:collapse关键字,折叠功能ES5.3版本之后才发布的。聚合&折叠只能针对keyword类型有效