• 以bank account 数据为例,认识elasticsearch query 和 filter


    Elasticsearch 查询语言(Query DSL)认识(一)

    这里写图片描述

    一、基本认识

    查询子句的行为取决于

    • query context
    • filter context

    也就是执行的是查询(query)还是过滤(filter)

    • query context 描述的是:被搜索的文档和查询子句的匹配程度

    • filter context 描述的是: 被搜索的文档和查询子句是否匹配

    一个是匹配程度问题,一个是是否匹配的问题

    二、实例

    1. 导入数据 bank account data download
    2. 将数据导入到elasticsearch
    curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
    curl 'localhost:9200/_cat/indices?v'
    

    这里有两个地方需要注意,1.host要改成符合自己的。2.早期版本中下载的数据可以能是'accounts.json?raw=true'
    大概如下 curl -XPOST 'wbelk:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json?raw=true"

    1. 参数认识

    为了便捷操作,可以安装一个kiabna sense

    $./bin/kibana plugin --install elastic/sense
    
    $./bin/kibana  
    sudo -i service restart kibana(或者用这个启动kibana)
    
    

    match_all 搜索,直接返回所有文档

    GET /bank/_search
    {
      "query": {
        "match_all": {}
      }
    }
    

    返回大致如下:

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1000,
        "max_score": 1,
        "hits": [
          {
            "_index": "bank",
            "_type": "account",
            "_id": "25",
            "_score": 1,
            "_source": {
              "account_number": 25,
              "balance": 40540,
              "firstname": "Virginia",
              "lastname": "Ayala",
              "age": 39,
              "gender": "F",
              "address": "171 Putnam Avenue",
              "employer": "Filodyne",
              "email": "virginiaayala@filodyne.com",
              "city": "Nicholson",
              "state": "PA"
            }
          },
    

    参数大致解释:

    • took: 执行搜索耗时,毫秒为单位,也就是本文我1ms
    • time_out: 搜索是否超时
    • _shards: 多少分片被搜索,成功多少,失败多少
    • hits: 搜索结果展示
    • hits.total: 匹配条件的文档总数
    • hits.hits: 返回结果展示,默认返回十个
    • hits.max_score:最大匹配得分
    • hits._score: 返回文档的匹配得分(得分越高,匹配程度越高,越靠前)
    • _index _type _id 作为剥层定位到特定的文档
    • _source 文档源
    1. 查询语言之 执行查询
    • 只显示account_number 和 balance
    POST /bank/_search
    {
      "query": { "match_all": {} },
      "_source": ["account_number", "balance"]
    }
    
    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1000,
        "max_score": 1,
        "hits": [
          {
            "_index": "bank",
            "_type": "account",
            "_id": "25",
            "_score": 1,
            "_source": {
              "account_number": 25,
              "balance": 40540
            }
          },
          {
            "_index": "bank",
            "_type": "account",
            "_id": "44",
            "_score": 1,
            "_source": {
              "account_number": 44,
              "balance": 34487
            }
          },
          {
            "_index": "bank",
            "_type": "account",
            "_id": "99",
            "_score": 1,
            "_source": {
              "account_number": 99,
              "balance": 47159
            }
          },
    
    • 返回accountu_number 为20的document
    POST /bank/_search
    {
      "query": { "match": { "account_number": 20 } }
    }
    
    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 5.6587105,
        "hits": [
          {
            "_index": "bank",
            "_type": "account",
            "_id": "20",
            "_score": 5.6587105,
            "_source": {
              "account_number": 20,
              "balance": 16418,
              "firstname": "Elinor",
              "lastname": "Ratliff",
              "age": 36,
              "gender": "M",
              "address": "282 Kings Place",
              "employer": "Scentric",
              "email": "elinorratliff@scentric.com",
              "city": "Ribera",
              "state": "WA"
            }
          }
        ]
      }
    }
    
    • 返回地址中包含(term)mill的所有账户
    POST /bank/_search
    {
      "query": { "match": { "address": "mill" } }
    }
    
    • 返回地址中包含term 'mill'或者 'lane'的所有账户
    POST /bank/_search
    {
      "query": { "match": { "address": "mill lane" } }
    }
    
    • 匹配phrase 'mill lane'
    POST /bank/_search
    {
      "query": { "match_phrase": { "address": "mill lane" } }
    }
    
    • 返回address包含'mill'和'lane'的所有账户 (AND)
    POST /bank/_search
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    
    • 返回address包含'mill'或'lane'的所有账户 (OR)
    POST /bank/_search
    {
      "query": {
        "bool": {
          "should": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    
    • 返回address既不包含'mill'也不包含'lane'的所有账户 (NO)
    POST /bank/_search
    {
      "query": {
        "bool": {
          "must_not": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    
    • 返回age为40,并且state不是ID的所有账户 (组合)
    POST /bank/_search
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "age": "40" } }
          ],
          "must_not": [
            { "match": { "state": "ID" } }
          ]
        }
      }
    }
    
    1. 查询语言之 执行过滤

    过滤不会进行相关度得分的计算

    • 在所有账户中寻找balance 在29900到30000之间(闭区间)的所有账户
      (先查询到所有的账户,然后进行过滤)
    POST /bank/_search
    {
      "query": {
        "filtered": {
          "query": { "match_all": {} },
          "filter": {
            "range": {
              "balance": {
                "gte": 29900,
                "lte": 30000
              }
            }
          }
        }
      }
    }
    
    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 5,
        "max_score": 1,
        "hits": [
          {
            "_index": "bank",
            "_type": "account",
            "_id": "243",
            "_score": 1,
            "_source": {
              "account_number": 243,
              "balance": 29902,
              "firstname": "Evangelina",
              "lastname": "Perez",
              "age": 20,
              "gender": "M",
              "address": "787 Joval Court",
              "employer": "Keengen",
              "email": "evangelinaperez@keengen.com",
              "city": "Mulberry",
              "state": "SD"
            }
          },
          {
            "_index": "bank",
            "_type": "account",
            "_id": "781",
            "_score": 1,
            "_source": {
              "account_number": 781,
              "balance": 29961,
              "firstname": "Sanford",
              "lastname": "Mullen",
              "age": 26,
              "gender": "F",
              "address": "879 Dover Street",
              "employer": "Zanity",
              "email": "sanfordmullen@zanity.com",
              "city": "Martinez",
              "state": "TX"
            }
          },
          ...
    

    根据返回结果我们可以看到filter得到的_score为1.不存在程度上的问题。是0和1的问题

    三、query和filter效率

    一般认为filter的速度快于query的速度

    • filter不会计算相关度得分,效率高
    • filter的结果可以缓存到内存中,方便再用
  • 相关阅读:
    Java 传递参数时,传递一个变量快还是传递一个实体类?
    13 设计模式
    12 反射
    11.多线程&&并发
    10.输入输出
    9.异常Exception
    7.正则表达式
    5.数组
    6.常见对象
    上传本地项目到Github
  • 原文地址:https://www.cnblogs.com/yangwenbo214/p/6256568.html
Copyright © 2020-2023  润新知