批量导入数据
ES 提供了一个叫 bulk 的 API 来进行批量操作
在ES安装目录下新建一个文件,文件名可以自定义,这里是player
文件内容如下:
{"index":{"_index":"nba","_type":"_doc","_id":"1"}} {"countryEn":"United States","teamName":"老鹰","birthDay":831182400000,"country":"美国","teamCityEn":"Atlanta","code":"jaylen_adams","displayAffiliation":"United States","displayName":"杰伦 亚当斯","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"86.2 公斤","teamCity":"亚特兰大","playYear":1,"jerseyNo":"10","teamNameEn":"Hawks","draft":2018,"displayNameEn":"Jaylen Adams","heightValue":1.88,"birthDayStr":"1996-05-04","position":"后卫","age":23,"playerId":"1629121"} {"index":{"_index":"nba","_type":"_doc","_id":"2"}} {"countryEn":"New Zealand","teamName":"雷霆","birthDay":743140800000,"country":"新西兰","teamCityEn":"Oklahoma City","code":"steven_adams","displayAffiliation":"Pittsburgh/New Zealand","displayName":"斯蒂文 亚当斯","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"120.2 公斤","teamCity":"俄克拉荷马城","playYear":6,"jerseyNo":"12","teamNameEn":"Thunder","draft":2013,"displayNameEn":"Steven Adams","heightValue":2.13,"birthDayStr":"1993-07-20","position":"中锋","age":26,"playerId":"203500"}
注意:最后需要空一行
执行以下命令,可以将文件里的数据批量导入
curl -X POST "localhost:9200/_bulk" -H "Content-Type: application/json" --data-binary @player
ES 之 term 的多种查询
单词级别查询:这些查询通常用于结构化的数据,比如:number, date, keyword 等,而不是对 text。也就是说,全文本查询之前要先对文本内容进行分词,而单词级别的查询直接在相应字段的反向索引中精确查找,单词级别的查询一般用于数值、日期等类型的字段上。
准备工作
- 删除nba索引
- 新增nba索引
-
POST:localhost:9200/nba/_mapping { "properties":{ "birthDay":{ "type":"date" }, "birthDayStr":{ "type":"keyword" }, "age":{ "type":"integer" }, "code":{ "type":"text" }, "country":{ "type":"text" }, "countryEn":{ "type":"text" }, "displayAffiliation":{ "type":"text" }, "displayName":{ "type":"text" }, "displayNameEn":{ "type":"text" }, "draft":{ "type":"long" }, "heightValue":{ "type":"float" }, "jerseyNo":{ "type":"text" }, "playYear":{ "type":"long" }, "playerId":{ "type":"keyword" }, "position":{ "type":"text" }, "schoolType":{ "type":"text" }, "teamCity":{ "type":"text" }, "teamCityEn":{ "type":"text" }, "teamConference":{ "type":"keyword" }, "teamConferenceEn":{ "type":"keyword" }, "teamName":{ "type":"keyword" }, "teamNameEn":{ "type":"keyword" }, "weight":{ "type":"text" } } }
- 批量导入数据(player文件)
Term query
精准匹配查询(查找号码为23的球员)
POST:localhost:9200/nba/_search { "query": { "term": { "jerseyNo": "23" } } }
Exsit Query
在特定的字段中查找非空值的文档(查找队名非空的球员)
POST:localhost:9200/nba/_search { "query": { "exists": { "field": "teamNameEn" } } }
Prefix Query
查找包含带有指定前缀 term 的文档(查找队名以Rock开头的球员)
POST:localhost:9200/nba/_search { "query": { "prefix": { "teamNameEn": "Rock" } } }
Wildcard Query
支持通配符查询,*表示任意字符,?表示任意单个字符(查找火箭队的球员)
POST:localhost:9200/nba/_search { "query": { "wildcard": { "teamNameEn": "Ro*s" } } }
Regexp Query
正则表达式查询(查找火箭队的球员)
POST:localhost:9200/nba/_search { "query": { "regexp": { "teamNameEn": "Ro.*s" } } }
Ids Query
id 查询(查找id为1和2的球员)
POST:localhost:9200/nba/_search { "query": { "ids": { "values": [1,2] } } }
ES 的范围查询
查找指定字段在指定范围内包含值(日期、数字或字符串)的文档。
查找在nba打了2年到10年以内的球员 POST:localhost:9200/nba/_search { "query": { "range": { "playYear": { "gte": 2, "lte": 10 } } } } 查找1980年到1999年出生的球员 POST:localhost:9200/nba/_search { "query": { "range": { "birthDay": { "gte": "01/01/1999", "lte": "2022", "format": "dd/MM/yyyy||yyyy" } } } }
ES 的布尔查询
- must:必须出现在匹配文档中
- filter:必须出现在文档中,但是不打分
- must_not:不能出现在文档中
- should:应该出现在文档中
must
查找名字叫做 James 的球员
POST:localhost:9200/nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ] } } }
filter
效果同 must,但是不打分(查找名字叫做 James 的球员)
must_not
查找名字叫做 James 的西部球员
POST:localhost:9200/nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ] } } }
should
即使匹配不到也返回,只是评分不同
查找名字叫做James的打球时间应该在11到20年西部球员
POST:localhost:9200/nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ], "should": [ { "range": { "playYear": { "gte": 11, "lte": 20 } } } ] } } }
如果 minimum_should_match=1,则变成要查出名字叫做 James 的打球时间在11到20年西部球员
POST:localhost:9200/nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ], "should": [ { "range": { "playYear": { "gte": 11, "lte": 20 } } } ], "minimum_should_match": 1 } } }
ES 的排序查询
火箭队中按打球时间从大到小排序的球员
POST:localhost:9200/nba/_search { "query": { "match": { "teamNameEn": "Rockets" } }, "sort": [ { "playYear": { "order": "desc" } } ] }
火箭队中按打球时间从大到小,如果年龄相同则按照身高从高到低排序的球员
POST:localhost:9200/nba/_search { "query": { "match": { "teamNameEn": "Rockets" } }, "sort": [ { "playYear": { "order": "desc" } }, { "heightValue": { "order": "asc" } } ] }
ES 聚合查询之指标聚合
ES 聚合分析是什么
聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES 作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大、最小、和、平均值等指标的聚合,在 ES 中称为指标聚合;而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组 group by,再在组上进行指标聚合,这在 ES 中称为桶聚合。
max/min/sum/avg
求出火箭队球员的平均年龄 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "avgAge": { "avg": { "field": "age" } } }, "size": 0 }
value_count
统计非空字段的文档数
求出火箭队中球员打球时间不为空的数量 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "countPlayerYear": { "value_count": { "field": "playYear" } } }, "size": 0 } 查出火箭队有多少名球员 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } } }
Cardinality
值去重计数
查出火箭队中年龄不同的数量 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "counAget": { "cardinality": { "field": "age" } } }, "size": 0 }
stats
统计 count max min avg sum 5个值
查出火箭队球员的年龄stats POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "statsAge": { "stats": { "field": "age" } } }, "size": 0 }
Extended stats
比 stats 多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
查出火箭队球员的年龄 Extend stats POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "extendStatsAge": { "extended_stats": { "field": "age" } } }, "size": 0 }
Percentiles
占比百分位对应的值统计,默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值
查出火箭的球员的年龄占比 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "pecentAge": { "percentiles": { "field": "age" } } }, "size": 0 } 查出火箭的球员的年龄占比(指定分位值) POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "percentAge": { "percentiles": { "field": "age", "percents": [ 20, 50, 75 ] } } }, "size": 0 }
ES 聚合查询之桶聚合
Terms Aggregation
根据字段项分组聚合
火箭队根据年龄进行分组 POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10 } } }, "size": 0 }
order
分组聚合排序
火箭队根据年龄进行分组,分组信息通过年龄从大到小排序 (通过指定字段) POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10, "order": { "_key": "desc" } } } }, "size": 0 } 火箭队根据年龄进行分组,分组信息通过文档数从大到小排序 (通过文档数) POST:localhost:9200/nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10, "order": { "_count": "desc" } } } }, "size": 0 } 每支球队按该队所有球员的平均年龄进行分组排序 (通过分组指标值) POST:localhost:9200/nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 }
筛选分组聚合
湖人和火箭队按球队平均年龄进行分组排序 (指定值列表) POST:localhost:9200/nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "include": [ "Lakers", "Rockets", "Warriors" ], "exclude": [ "Warriors" ], "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 } 湖人和火箭队按球队平均年龄进行分组排序 (正则表达式匹配值) POST:localhost:9200/nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "include": "Lakers|Ro.*|Warriors.*", "exclude": "Warriors", "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 }
Range Aggregation
范围分组聚合
NBA球员年龄按20,20-35,35这样分组 POST:localhost:9200/nba/_search { "aggs": { "ageRange": { "range": { "field": "age", "ranges": [ { "to": 20 }, { "from": 20, "to": 35 }, { "from": 35 } ] } } }, "size": 0 } NBA球员年龄按20,20-35,35这样分组 (起别名) POST:localhost:9200/nba/_search { "aggs": { "ageRange": { "range": { "field": "age", "ranges": [ { "to": 20, "key": "A" }, { "from": 20, "to": 35, "key": "B" }, { "from": 35, "key": "C" } ] } } }, "size": 0 }
Date Range Aggregation
时间范围分组聚合
NBA球员按出生年月分组 POST:localhost:9200/nba/_search { "aggs": { "birthDayRange": { "date_range": { "field": "birthDay", "format": "MM-yyy", "ranges": [ { "to": "01-1989" }, { "from": "01-1989", "to": "01-1999" }, { "from": "01-1999", "to": "01-2009" }, { "from": "01-2009" } ] } } }, "size": 0 }
Date Histogram Aggregation
时间柱状图聚合:按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day(1d), hour (1h), minute (1m), second (1s) 间隔聚合
NBA球员按出生年分组 POST:localhost:9200/nba/_search { "aggs": { "birthday_aggs": { "date_histogram": { "field": "birthDay", "format": "yyyy", "interval": "year" } } }, "size": 0 }
ES 之 query_string 查询
query_string 查询,如果熟悉 lucene 的查询语法,我们可以直接用 lucene 查询语法写一个查询串进行查询,ES 中接到请求后,通过查询解析器,解析查询串生成对应的查询。
指定单个字段查询
POST:localhost:9200/nba/_search { "query": { "query_string": { "default_field": "displayNameEn", "query": "james OR curry" } }, "size": 100 } { "query": { "query_string": { "default_field": "displayNameEn", "query": "james AND harden" } }, "size": 100 }
指定多个字段查询
POST:localhost:9200/nba/_search { "query": { "query_string": { "fields": [ "displayNameEn", "teamNameEn" ], "query": "James AND Rockets" } }, "size": 100 }