1.searchAPI
ES支持两种基本方式检索
一个是通过REST request URL发送搜索参数(url+检索参数)
另一个是通过使用REST requestbody来发送他们(url+请求体)
1.1url+检索参数
GET bank/_search?q=*&sort=account_number:asc
q=*:查询所有
sort=account_number:asc:按照账号进行升序排列
查询结果--默认只返回10条(默认的分页查询)
{
"took": 57,
"timed_out": false, //是否超时
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": { //命中的记录
"total": { //总记录
"value": 548,
"relation": "eq"
},
"max_score": null,//最大得分
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "1",
"_score": null,
"_source": { //数据的实际信息
"account_number": 1,
"balance": 39225,
"firstname": "Amber",
"lastname": "Duke",
"age": 32,
"gender": "M",
"address": "880 Holmes Lane",
"employer": "Pyrami",
"email": "amberduke@pyrami.com",
"city": "Brogan",
"state": "IL"
},
"sort": [
1
]
},
1.2url+请求体
- query:查询条件
- search:排序条件
GET bank/_search
GET bank/_search
{
"query": {"match_all": {}}, //匹配所有
"sort": [
{
"account_number": "desc"//按照账号升序
},
{
"balance": "desc" //按照越余额降序
}
]
}
- 注意:
HTTP客户端工具(POSTMAN),get请求不能携带请求体,我们变为post也是一样的我们POST一个JSON风格的查询请求到_search API。
需要了解,一旦搜索结果被返回,ElasticSearch就完成了这次请求,并且不会维护任何服务端的资源或者结果的cursor(游标)
2.query DSL语法基本使用
2.1match_all查询所有
GET bank/_search
{
"query":{
"match_all":{}
}
}
语法结构
{
QUERY_NAME:{
FIELD NAME:{
ARGUMENI:VALUE,
ARGUMENT:VALUE...
}
}
}
例:按照 balance 降序查询:
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
]
}
简单表达形式
"balance": {
"order": "desc"
}
可以简写为:
"balance": "desc"
2.2from,size分页查询
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
],
"from": 0,
"size": 5,
"_source": ["balance", "account_number"]
}
2.3_source只返回部分字段
2.4match全文检索
GET bank/_search
//查询 account_number 是 20 的所有结果:
{
"query": {
"match": {
"account_number": 20
}
}
}
- 进行模糊查询(全文检索)
按照评分进行排序,会对检索条件进行分词匹配
GET bank/_search
//查询所有 address 中包含 Kings 的数据
{
"query": {
"match": {
"address": "Kings"
}
}
}
//最终查询出address中包含mill或者road或者mill road的所有记录,并给出相关性评分
2.5match_phrase短语匹配
将要匹配的值当成一个整体单词(不分词)进行索引
GET bank/_search
//查出address中包含millroad的所有记录,并给出相关性评分
GET bank/_search
{
"query": {
"match_phrase": {
"address": "Mill Lane"
}
}
}
2.6multi_match多字段匹配
进行了分词
查询出指定字段包含mill的
GET bank/_search
//state字段或者address字段包含mill的情况
{
"query": {
"multi_match": {
"query": "mill",
"fields": ["address", "email"]
}
}
}
2.7bool复合查询
bool用来做复合查询
复合查询可以合并任何其他查询语句,包括复合语句,了解这一点是很重要的,这意味着。复合语句之间可以相互嵌套,可以表达非常复杂的逻辑
- must:必须有
- must_not:除了
- should:可有可无
GET bank/_search
{
"query": {
"bool": {
"must": [ //必须有
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [ //除了
{
"match": {
"age": "28"
}
}
],
"should": [ //可有可无
{
"match": {
"lastname": "Hines"
}
}
]
}
}
}
2.8filter结果过滤
布尔查询中的每个must、should和must not元素都称为查询子句。文档满足每个 must 或 should 子句中的标准的程度有助于文档的相关性得分。分数越高,文档就越符合您的搜索条件。默认情况下,Elasticsearch返回按这些相关性得分排序的文档。
must_not 子句中的条件被视为 filter。它影响文档是否包含在结果中,filter、must_not 都不影响文档的得分。
还可以显式指定任意过滤器,以包含或排除基于结构化数据的文档。
- 例如,查找年龄在 10 - 30 的数据
GET /bank/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gte": 10,
"lte": 30
}
}
}
]
}
}
}
返回结果:
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 498,
"relation" : "eq"
},
"max_score" : 1.0, //注意这里,使用must贡献了相关性得分
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : 1.0, //注意这里,使用must贡献了相关性得分
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
- 我们也可以使用Filter:
GET /bank/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"age": {
"gte": 10,
"lte": 30
}
}
}
]
}
}
}
返回的结果是:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 498,
"relation" : "eq"
},
"max_score" : 0.0, //注意这里没有贡献相关性得分
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : 0.0, //注意这里没有贡献相关性得分
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
2.9term查询
规定全文检索用match
非全文检索用term
用于找精确字段
返回在提供的字段中包含确切信息的文档内容。
您可以使用精确的值(例如价格,产品ID或用户名)利用 Term 查询查找文档。
GET /bank/_search
{
"query": {
"term": {
"age": 33
}
}
}
注意:
避免term对text字段使用查询。
因为es在保存text字段的时候存在数据分析的问题
默认情况下,Elasticsearch更改text字段的值作为analysis的一部分。这会使查找text字段值的精确匹配变得困难。
要搜索text字段值,请改用match查询。
{
"query":{
"match":{
"address":"789 Madison Street"
}
}
}
如何文本精确查询?
- 查询地址值必须是 435 Furman Street 的(精确匹配 keyword):
GET /bank/_search
{
"query": {
"match": {
"address.keyword": "435 Furman Street" //这个的搜索结果在改为435 Furman时不会展示
}
}
}
使用match-parse
GET /bank/_search
{
"query": {
"match_phrase": {
"address": "435 Furman Street" //这个的搜索结果在改为435 Furman时依旧会展示
}
}
}
- match_parse和keyword的区别
match_parse:只要包含"address.keyword": "435 Furman Street"即可
keyword:address要完全等于"435 Furman Street"
2.10aggregations(执行聚合)
聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于 SQL GROUP BY 和 SQL 聚合函数。在 Elasticsearch 中,您有执行索返回 hits(命中结果),并且同时返回聚合结果,把一个响应中的所有hits(命中结果)隔开的能力。这是非常强大且有效的,您可以执行查询和多个聚合,并且在一次使用得到各自的(任何一个的)返回结果,使用一次简洁和简化的AP来避免网络往返。
搜索 address中包含mill的所有人的年龄分布以及平均年龄,但不显示这些人的详情。
GET /bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": { //获取聚合
"ageAgg": { //自定义的聚合名
"terms": { //获取结果的不同数据个数
"field": "age", //获取字段是age
"size": 10 //可能有很多很多可能,只获取前10种
}
},
"ageAvg":{ //自定义的聚合名
"avg": { //求平均值
"field": "age" //获取字段是age
}
}
}
}
结果:
{
"took" : 27,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 5.4032025,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 5.4032025,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "parkerhines@baluba.com",
"city" : "Blackgum",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "leelong@comverges.com",
"city" : "Movico",
"state" : "MT"
}
}
]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"ageAvg" : {
"value" : 34.0
}
}
}
如果我们不希望返回数据,只需要分析结果,可以设置 size 为 0
GET /bank/_search
{
"query": {~},
"aggs": {~},
"size": 0
}
按照年龄聚合,并且请求这些年龄段的这些人的平均薪资
GET /bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": { //子聚合
"ageAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
结果
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"ageAvg" : {
"value" : 28312.918032786885
}
},
{
"key" : 39,
"doc_count" : 60,
"ageAvg" : {
"value" : 25269.583333333332
}
},
{
"key" : 26,
"doc_count" : 59,
"ageAvg" : {
"value" : 23194.813559322032
}
},
...
案例3
查询出所有年龄分布,并且这些 年龄段中 性别为 M 的平均薪资 和 性别为 F 的平均薪资 以及 这个年龄段的总体平均薪资
GET /bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAgg":{
"terms": {
"field": "gender.keyword",
"size": 10
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBlanace":{
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
返回结果:
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
},
"ageBlanace" : {
"value" : 28312.918032786885
}
},
{
"key" : 39,
"doc_count" : 60,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "F",
"doc_count" : 38,
"balanceAvg" : {
"value" : 26348.684210526317
}
},
{
"key" : "M",
"doc_count" : 22,
"balanceAvg" : {
"value" : 23405.68181818182
}
}
]
},
"ageBlanace" : {
"value" : 25269.583333333332
}
},
{
"key" : 26,
"doc_count" : 59,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 32,
"balanceAvg" : {
"value" : 25094.78125
}
},
{
"key" : "F",
"doc_count" : 27,
"balanceAvg" : {
"value" : 20943.0
}
}
]
},
"ageBlanace" : {
"value" : 23194.813559322032
}
},
...