• 【ES】term和match的区别


    term用法

    先看看term的定义,term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词拆解。

    这里通过例子来说明,先存放一些数据:

    {
        "title": "love China",
        "content": "people very love China",
        "tags": ["China", "love"]
    }
    {
        "title": "love HuBei",
        "content": "people very love HuBei",
        "tags": ["HuBei", "love"]
    }

    来使用term 查询下:

    {
      "query": {
        "term": {
          "title": "love"
        }
      }
    }

    结果是,上面的两条数据都能查询到:

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.6931472,
        "hits": [
          {
            "_index": "test",
            "_type": "doc",
            "_id": "8",
            "_score": 0.6931472,
            "_source": {
              "title": "love HuBei",
              "content": "people very love HuBei",
              "tags": ["HuBei","love"]
            }
          },
          {
            "_index": "test",
            "_type": "doc",
            "_id": "7",
            "_score": 0.6931472,
            "_source": {
              "title": "love China",
              "content": "people very love China",
              "tags": ["China","love"]
            }
          }
        ]
      }
    }

    发现,title里有关love的关键字都查出来了,但是我只想精确匹配 love China这个,按照下面的写法看看能不能查出来:

    {
      "query": {
        "term": {
          "title": "love China"
        }
      }
    }

    执行发现无数据,从概念上看,term属于精确匹配,只能查单个词。我想用term匹配多个词怎么做?可以使用terms来:

    {
      "query": {
        "terms": {
          "title": ["love", "China"]
        }
      }
    }

    查询结果为:

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.6931472,
        "hits": [
          {
            "_index": "test",
            "_type": "doc",
            "_id": "8",
            "_score": 0.6931472,
            "_source": {
              "title": "love HuBei",
              "content": "people very love HuBei",
              "tags": ["HuBei","love"]
            }
          },
          {
            "_index": "test",
            "_type": "doc",
            "_id": "7",
            "_score": 0.6931472,
            "_source": {
              "title": "love China",
              "content": "people very love China",
              "tags": ["China","love"]
            }
          }
        ]
      }
    }

    发现全部查询出来,为什么?因为terms里的[ ] 多个是或者的关系,只要满足其中一个词就可以。想要通知满足两个词的话,就得使用bool的must来做,如下:

    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "title": "love"
              }
            },
            {
              "term": {
                "title": "china"
              }
            }
          ]
        }
      }
    }
    可以看到,我们上面使用china是小写的。当使用的是大写的China 我们进行搜索的时候,发现搜不到任何信息。这是为什么了?title这个词在进行存储的时候,进行了分词处理。我们这里使用的是默认的分词处理器进行了分词处理。我们可以看看如何进行分词处理的?

    分词处理器

    GET test/_analyze
    {
      "text" : "love China"
    }

    结果为:

    {
      "tokens": [
        {
          "token": "love",
          "start_offset": 0,
          "end_offset": 4,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "china",
          "start_offset": 5,
          "end_offset": 10,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    }

    分析出来的为lovechina的两个词。而term只能完完整整的匹配上面的词,不做任何改变的匹配。所以,我们使用China这样的方式进行的查询的时候,就会失败。稍后会有一节专门讲解分词器。

    match 用法

    先用 love China来匹配。

    GET test/doc/_search
    {
      "query": {
        "match": {
          "title": "love China"
        }
      }
    }

    结果是:

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 1.3862944,
        "hits": [
          {
            "_index": "test",
            "_type": "doc",
            "_id": "7",
            "_score": 1.3862944,
            "_source": {
              "title": "love China",
              "content": "people very love China",
              "tags": [
                "China",
                "love"
              ]
            }
          },
          {
            "_index": "test",
            "_type": "doc",
            "_id": "8",
            "_score": 0.6931472,
            "_source": {
              "title": "love HuBei",
              "content": "people very love HuBei",
              "tags": [
                "HuBei",
                "love"
              ]
            }
          }
        ]
      }
    }

    发现两个都查出来了,为什么?因为match进行搜索的时候,会先进行分词拆分,拆完后,再来匹配,上面两个内容,他们title的词条为: love china hubei ,我们搜索的为love China 我们进行分词处理得到为love china ,并且属于或的关系,只要任何一个词条在里面就能匹配到。如果想 loveChina 同时匹配到的话,怎么做?使用 match_phrase

    match_phrase 用法

    match_phrase 称为短语搜索,要求所有的分词必须同时出现在文档中,同时位置必须紧邻一致。

    GET test/doc/_search
    {
      "query": {
        "match_phrase": {
          "title": "love china"
        }
      }
    }

    结果为:

    {
      "took": 5,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 1.3862944,
        "hits": [
          {
            "_index": "test",
            "_type": "doc",
            "_id": "7",
            "_score": 1.3862944,
            "_source": {
              "title": "love China",
              "content": "people very love China",
              "tags": [
                "China",
                "love"
              ]
            }
          }
        ]
      }
    }
  • 相关阅读:
    js对象写法
    IE6双边距bug及其解决办法
    图片轮播
    盒子水平和垂直同时居中方法
    选项卡切换
    针对IE6兼容png
    html5兼容
    sublime快捷键总结
    七种设计原则
    Git基本命令
  • 原文地址:https://www.cnblogs.com/weknow619/p/ES.html
Copyright © 2020-2023  润新知