• es7.+(二)进阶索引


    1.searchAPI

    ES支持两种基本方式检索
    一个是通过REST request URL发送搜索参数(url+检索参数)
    另一个是通过使用REST requestbody来发送他们(url+请求体)

    1.1url+检索参数

    GET bank/_search?q=*&sort=account_number:asc
    q=*:查询所有
    sort=account_number:asc:按照账号进行升序排列

    查询结果--默认只返回10条(默认的分页查询)

    {
        "took": 57,
        "timed_out": false, //是否超时
        "_shards": {
            "total": 1,
            "successful": 1,
            "skipped": 0,
            "failed": 0
        },
        "hits": {          //命中的记录
            "total": {     //总记录
                "value": 548,
                "relation": "eq"
            },
            "max_score": null,//最大得分
            "hits": [
                {
                    "_index": "bank",
                    "_type": "account",
                    "_id": "1",
                    "_score": null,
                    "_source": {  //数据的实际信息
                        "account_number": 1,
                        "balance": 39225,
                        "firstname": "Amber",
                        "lastname": "Duke",
                        "age": 32,
                        "gender": "M",
                        "address": "880 Holmes Lane",
                        "employer": "Pyrami",
                        "email": "amberduke@pyrami.com",
                        "city": "Brogan",
                        "state": "IL"
                    },
                    "sort": [
                        1
                    ]
                },
    

    1.2url+请求体

    • query:查询条件
    • search:排序条件
      GET bank/_search
        GET bank/_search
        {
          "query": {"match_all": {}},  //匹配所有
          "sort": [
            {
              "account_number": "desc"//按照账号升序
            },
            {
              "balance": "desc"  //按照越余额降序
            }
          ]
        }
    
    • 注意
      HTTP客户端工具(POSTMAN),get请求不能携带请求体,我们变为post也是一样的我们POST一个JSON风格的查询请求到_search API。
      需要了解,一旦搜索结果被返回,ElasticSearch就完成了这次请求,并且不会维护任何服务端的资源或者结果的cursor(游标)

    2.query DSL语法基本使用

    2.1match_all查询所有

    GET bank/_search

    {
      "query":{
        "match_all":{}
      }
    }
    

    语法结构

    {
        QUERY_NAME:{
            FIELD NAME:{
                ARGUMENI:VALUE,
                ARGUMENT:VALUE...
            }
        }
    }
    

    例:按照 balance 降序查询:

        GET bank/_search
        {
          "query": {
            "match_all": {}
          },
          "sort": [
            {
              "balance": {
                "order": "desc"
              }
            }
          ]
        }
    

    简单表达形式

        "balance": {
            "order": "desc"
        }
        可以简写为:
        "balance": "desc"
    

    2.2from,size分页查询

    GET bank/_search
    {
      "query": {
        "match_all": {}
      },
      "sort": [
        {
          "balance": {
            "order": "desc"
          }
        }
      ],
      "from": 0,
      "size": 5,
      "_source": ["balance", "account_number"]
    }
    

    2.3_source只返回部分字段

    2.4match全文检索

    GET bank/_search

        //查询 account_number 是 20 的所有结果:
        {
          "query": {
            "match": {
              "account_number": 20
            }
          }
        }
    
    • 进行模糊查询(全文检索)
      按照评分进行排序,会对检索条件进行分词匹配
      GET bank/_search
    //查询所有 address 中包含 Kings 的数据
    {
      "query": {
        "match": {
          "address": "Kings"
        }
      }
    }
    //最终查询出address中包含mill或者road或者mill road的所有记录,并给出相关性评分
    

    2.5match_phrase短语匹配

    将要匹配的值当成一个整体单词(不分词)进行索引
    GET bank/_search

    //查出address中包含millroad的所有记录,并给出相关性评分
        GET bank/_search
        {
          "query": {
            "match_phrase": {
              "address": "Mill Lane"
            }
          }
        }
    

    2.6multi_match多字段匹配

    进行了分词
    查询出指定字段包含mill的
    GET bank/_search

    //state字段或者address字段包含mill的情况
    {
      "query": {
        "multi_match": {
          "query": "mill",
          "fields": ["address", "email"]
        }
      }
    }
    

    2.7bool复合查询

    bool用来做复合查询
    复合查询可以合并任何其他查询语句,包括复合语句,了解这一点是很重要的,这意味着。复合语句之间可以相互嵌套,可以表达非常复杂的逻辑

    • must:必须有
    • must_not:除了
    • should:可有可无
      GET bank/_search
    {
      "query": {
        "bool": {
          "must": [    //必须有
            {
              "match": {
                "gender": "M"
              }
            },
            {
              "match": {
                "address": "mill"
              }
            }
          ],
          "must_not": [    //除了
            {
              "match": {
                "age": "28"
              }
            }
          ],
          "should": [     //可有可无
            {
              "match": {
                "lastname": "Hines"
              }
            }
          ]
        }
      }
    }
    

    2.8filter结果过滤

    布尔查询中的每个must、should和must not元素都称为查询子句。文档满足每个 must 或 should 子句中的标准的程度有助于文档的相关性得分。分数越高,文档就越符合您的搜索条件。默认情况下,Elasticsearch返回按这些相关性得分排序的文档。

    must_not 子句中的条件被视为 filter。它影响文档是否包含在结果中,filter、must_not 都不影响文档的得分。

    还可以显式指定任意过滤器,以包含或排除基于结构化数据的文档。

    • 例如,查找年龄在 10 - 30 的数据
    GET /bank/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "age": {
                  "gte": 10,
                  "lte": 30
                }
              }
            }
          ]
        }
      }
    }
    

    返回结果:

    {
      "took" : 10,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 498,
          "relation" : "eq"
        },
        "max_score" : 1.0,  //注意这里,使用must贡献了相关性得分
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "13",
            "_score" : 1.0,  //注意这里,使用must贡献了相关性得分
            "_source" : {
              "account_number" : 13,
              "balance" : 32838,
              "firstname" : "Nanette",
              "lastname" : "Bates",
              "age" : 28,
              "gender" : "F",
    
    • 我们也可以使用Filter:
    GET /bank/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "age": {
                  "gte": 10,
                  "lte": 30
                }
              }
            }
          ]
        }
      }
    }
    

    返回的结果是:

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 498,
          "relation" : "eq"
        },
        "max_score" : 0.0,   //注意这里没有贡献相关性得分
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "13",
            "_score" : 0.0,   //注意这里没有贡献相关性得分
            "_source" : {
              "account_number" : 13,
              "balance" : 32838,
              "firstname" : "Nanette",
              "lastname" : "Bates",
              "age" : 28,
              "gender" : "F",
              "address" : "789 Madison Street",
              "employer" : "Quility",
              "email" : "nanettebates@quility.com",
              "city" : "Nogal",
              "state" : "VA"
    

    2.9term查询

    规定全文检索用match
    非全文检索用term
    用于找精确字段
    返回在提供的字段中包含确切信息的文档内容。

    您可以使用精确的值(例如价格,产品ID或用户名)利用 Term 查询查找文档。

        GET /bank/_search
        {
          "query": {
            "term": {
              "age": 33
            }
          }
        }
    

    注意:
    避免term对text字段使用查询。
    因为es在保存text字段的时候存在数据分析的问题
    默认情况下,Elasticsearch更改text字段的值作为analysis的一部分。这会使查找text字段值的精确匹配变得困难。

    要搜索text字段值,请改用match查询。

    {
      "query":{
        "match":{
          "address":"789 Madison Street"
        }
      }
    }
    

    如何文本精确查询?

    • 查询地址值必须是 435 Furman Street 的(精确匹配 keyword):
        GET /bank/_search
        {
          "query": {
            "match": {
              "address.keyword": "435 Furman Street"  //这个的搜索结果在改为435 Furman时不会展示
            }
          }
        }
    

    使用match-parse

        GET /bank/_search
        {
          "query": {
            "match_phrase": {
              "address": "435 Furman Street"  //这个的搜索结果在改为435 Furman时依旧会展示
            }
          }
        }
    
    • match_parse和keyword的区别
      match_parse:只要包含"address.keyword": "435 Furman Street"即可
      keyword:address要完全等于"435 Furman Street"

    2.10aggregations(执行聚合)

    聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于 SQL GROUP BY 和 SQL 聚合函数。在 Elasticsearch 中,您有执行索返回 hits(命中结果),并且同时返回聚合结果,把一个响应中的所有hits(命中结果)隔开的能力。这是非常强大且有效的,您可以执行查询和多个聚合,并且在一次使用得到各自的(任何一个的)返回结果,使用一次简洁和简化的AP来避免网络往返。

    搜索 address中包含mill的所有人的年龄分布以及平均年龄,但不显示这些人的详情。

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "mill" 
        }
      },
      "aggs": {    //获取聚合
        "ageAgg": {      //自定义的聚合名
          "terms": {       //获取结果的不同数据个数
            "field": "age",    //获取字段是age
            "size": 10      //可能有很多很多可能,只获取前10种
          }
        },
        "ageAvg":{   //自定义的聚合名
          "avg": {      //求平均值
            "field": "age"    //获取字段是age
          }
        }
      }
    }
    

    结果:

    {
      "took" : 27,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : 5.4032025,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "970",
            "_score" : 5.4032025,
            "_source" : {
              "account_number" : 970,
              "balance" : 19648,
              "firstname" : "Forbes",
              "lastname" : "Wallace",
              "age" : 28,
              "gender" : "M",
              "address" : "990 Mill Road",
              "employer" : "Pheast",
              "email" : "forbeswallace@pheast.com",
              "city" : "Lopezo",
              "state" : "AK"
            }
          },
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "136",
            "_score" : 5.4032025,
            "_source" : {
              "account_number" : 136,
              "balance" : 45801,
              "firstname" : "Winnie",
              "lastname" : "Holland",
              "age" : 38,
              "gender" : "M",
              "address" : "198 Mill Lane",
              "employer" : "Neteria",
              "email" : "winnieholland@neteria.com",
              "city" : "Urie",
              "state" : "IL"
            }
          },
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "345",
            "_score" : 5.4032025,
            "_source" : {
              "account_number" : 345,
              "balance" : 9812,
              "firstname" : "Parker",
              "lastname" : "Hines",
              "age" : 38,
              "gender" : "M",
              "address" : "715 Mill Avenue",
              "employer" : "Baluba",
              "email" : "parkerhines@baluba.com",
              "city" : "Blackgum",
              "state" : "KY"
            }
          },
          {
            "_index" : "bank",
            "_type" : "account",
            "_id" : "472",
            "_score" : 5.4032025,
            "_source" : {
              "account_number" : 472,
              "balance" : 25571,
              "firstname" : "Lee",
              "lastname" : "Long",
              "age" : 32,
              "gender" : "F",
              "address" : "288 Mill Street",
              "employer" : "Comverges",
              "email" : "leelong@comverges.com",
              "city" : "Movico",
              "state" : "MT"
            }
          }
        ]
      },
      "aggregations" : {
        "ageAgg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 38,
              "doc_count" : 2
            },
            {
              "key" : 28,
              "doc_count" : 1
            },
            {
              "key" : 32,
              "doc_count" : 1
            }
          ]
        },
        "ageAvg" : {
          "value" : 34.0
        }
      }
    }
    

    如果我们不希望返回数据,只需要分析结果,可以设置 size 为 0

        GET /bank/_search
        {
          "query": {~},
          "aggs": {~},
          "size": 0
        }
    

    按照年龄聚合,并且请求这些年龄段的这些人的平均薪资

    GET /bank/_search
    {
      "query": {
        "match_all": {}
      },
      "aggs": {
        "ageAgg": {
          "terms": {
            "field": "age",
            "size": 100
          },
          "aggs": {  //子聚合
            "ageAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      },
      "size": 0
    }
    

    结果

    {
      "took" : 16,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1000,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "ageAgg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 31,
              "doc_count" : 61,
              "ageAvg" : {
                "value" : 28312.918032786885
              }
            },
            {
              "key" : 39,
              "doc_count" : 60,
              "ageAvg" : {
                "value" : 25269.583333333332
              }
            },
            {
              "key" : 26,
              "doc_count" : 59,
              "ageAvg" : {
                "value" : 23194.813559322032
              }
            },
    ...
    

    案例3

    查询出所有年龄分布,并且这些 年龄段中 性别为 M 的平均薪资 和 性别为 F 的平均薪资 以及 这个年龄段的总体平均薪资

    GET /bank/_search
    {
      "query": {
        "match_all": {}
      },
      "aggs": {
        "ageAgg": {
          "terms": {
            "field": "age",
            "size": 100
          },
          "aggs": {
            "genderAgg":{
              "terms": {
                "field": "gender.keyword",
                "size": 10
              },
              "aggs": {
                "balanceAvg": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            },
            "ageBlanace":{
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      },
      "size": 0
    }
    

    返回结果:

    {
      "took" : 16,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1000,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "ageAgg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 31,
              "doc_count" : 61,
              "genderAgg" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "M",
                    "doc_count" : 35,
                    "balanceAvg" : {
                      "value" : 29565.628571428573
                    }
                  },
                  {
                    "key" : "F",
                    "doc_count" : 26,
                    "balanceAvg" : {
                      "value" : 26626.576923076922
                    }
                  }
                ]
              },
              "ageBlanace" : {
                "value" : 28312.918032786885
              }
            },
            {
              "key" : 39,
              "doc_count" : 60,
              "genderAgg" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "F",
                    "doc_count" : 38,
                    "balanceAvg" : {
                      "value" : 26348.684210526317
                    }
                  },
                  {
                    "key" : "M",
                    "doc_count" : 22,
                    "balanceAvg" : {
                      "value" : 23405.68181818182
                    }
                  }
                ]
              },
              "ageBlanace" : {
                "value" : 25269.583333333332
              }
            },
            {
              "key" : 26,
              "doc_count" : 59,
              "genderAgg" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "M",
                    "doc_count" : 32,
                    "balanceAvg" : {
                      "value" : 25094.78125
                    }
                  },
                  {
                    "key" : "F",
                    "doc_count" : 27,
                    "balanceAvg" : {
                      "value" : 20943.0
                    }
                  }
                ]
              },
              "ageBlanace" : {
                "value" : 23194.813559322032
              }
            },
    ...
    
  • 相关阅读:
    匈牙利算法
    Tabbed Activity = viewpager + fragment ?
    gdb调试多线程多进程
    gdb 调试,当发现程序退出,需要定位程序退出位置时。
    将Linux的信号量sem_t封装成事件对象
    Golang包管理工具govendor的使用&go mod
    go get命令详解
    GoLand生成可执行文件(Windows、Linux)
    Linux下线程pid和tid
    理解Linux的进程,线程,PID,LWP,TID,TGID
  • 原文地址:https://www.cnblogs.com/psyduck/p/14471779.html
Copyright © 2020-2023  润新知