• elasticsearch之拼音搜索


    拼音搜索在中文搜索环境中是经常使用的一种功能,用户只需要输入关键词的拼音全拼或者拼音首字母,搜索引擎就可以搜索出相关结果。在国内,中文输入法基本上都是基于汉语拼音的,这种在符合用户输入习惯的条件下缩短用户输入时间的功能是非常受欢迎的;

    一、安装拼音搜索插件

    下载对应版本的elasticsearch-analysis-pinyin插件;

    https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.9.2/elasticsearch-analysis-pinyin-7.9.2.zip
    

    在elasticsearch安装目录下的的plugin目录新建analysis-pinyin目录,并解压下载的安装包;

    重启elasticsearch,可以看到已经正常加载拼音插件

    [2022-01-13T20:37:25,368][INFO ][o.e.p.PluginsService     ] [mango] loaded plugin [analysis-pinyin]
    

    二、使用拼音插件

    试一下分词效果,可以看到除了每个词的全频,还有每个字的首字母缩写;

    POST _analyze
    {
      "analyzer": "pinyin",
      "text": "我爱你,中国"
    }
    
    {
      "tokens" : [
        {
          "token" : "wo",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "wanzg",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "ai",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "ni",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "zhong",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 3
        },
        {
          "token" : "guo",
          "start_offset" : 0,
          "end_offset" : 0,
          "type" : "word",
          "position" : 4
        }
      ]
    }
    
    

    自定义pinyin filter,并创建mapping;

    PUT /milk
    {
      "settings": {
        "analysis": {
          "filter": {
            "pinyin_filter":{
              "type":"pinyin",
              "keep_separate_first_letter" : false,
              "keep_full_pinyin" : true,
              "keep_original" : true,
              "limit_first_letter_length" : 16,
              "lowercase" : true,
              "remove_duplicated_term" : true
    }
          },
          "analyzer": {
            "ik_pinyin_analyzer":{
              "tokenizer":"ik_max_word",
               "filter":["pinyin_filter"]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "brand":{
            "type": "text",
            "analyzer": "ik_pinyin_analyzer"
          },
          "series":{
            "type": "text",
            "analyzer": "ik_pinyin_analyzer"
          },
          "price":{
            "type": "float"
          }
        }
      }
    }
    

    批量索引文档;

    POST _bulk
    {"index":{"_index":"milk", "_id":1}}}
    {"brand":"蒙牛", "series":"特仑苏", "price":60}
    {"index":{"_index":"milk", "_id":2}}}
    {"brand":"蒙牛", "series":"真果粒", "price":40}
    {"index":{"_index":"milk", "_id":3}}}
    {"brand":"华山牧", "series":"华山牧", "price":49.90}
    {"index":{"_index":"milk", "_id":4}}}
    {"brand":"伊利", "series":"安慕希", "price":49.90}
    {"index":{"_index":"milk", "_id":5}}}
    {"brand":"伊利", "series":"金典", "price":49.90}
    

    搜索tls,可以看到已经匹配到对应的记录;

    POST milk/_search
    {
      
      "query": {
        "match_phrase_prefix": {
          "series": "tl"
        }
      }
    }
    
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.691126,
        "hits" : [
          {
            "_index" : "milk",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.691126,
            "_source" : {
              "brand" : "蒙牛",
              "series" : "特仑苏",
              "price" : 60
            }
          }
        ]
      }
    }
    
    
    

    可以看到查询直接使用MultiPhraseQuery来实现对两个位置字符的定位;

    POST milk/_search
    {
      
      "query": {
        "match_phrase_prefix": {
          "series": "tl"
        }
      },
      "highlight": {
        "fields": {
          "series": {}
        }
      }
    }
    
    {
      "profile" : {
        "shards" : [
          {
            "id" : "[OoNXoregTmKQAFotUgOeaA][milk][0]",
            "searches" : [
              {
                "query" : [
                  {
                    "type" : "MultiPhraseQuery",
                    "description" : "series:\"(t tl) (li l lun)\"",
                    "time_in_nanos" : 177400,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 1,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 5400,
                      "match" : 12800,
                      "next_doc_count" : 1,
                      "score_count" : 1,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 9200,
                      "advance_count" : 1,
                      "score" : 3900,
                      "build_scorer_count" : 2,
                      "create_weight" : 40800,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 105300
                    }
                  }
                ],
                "rewrite_time" : 45700,
                "collector" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 14500
                  }
                ]
              }
            ],
            "aggregations" : [ ]
          }
        ]
      }
    }
    
    

    查询也可以返回高亮信息

    POST milk/_search
    {
      
      "query": {
        "match_phrase_prefix": {
          "series": "tl"
        }
      },
      "highlight": {
        "fields": {
          "series": {}
        }
      }
    }
    
    
    
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.691126,
        "hits" : [
          {
            "_index" : "milk",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.691126,
            "_source" : {
              "brand" : "蒙牛",
              "series" : "特仑苏",
              "price" : 60
            },
            "highlight" : {
              "series" : [
                "<em>特</em><em>仑</em>苏"
              ]
            }
          }
        ]
      }
    }
    
    
    
  • 相关阅读:
    Win10
    编码
    [转帖] Tomcat安全配置小技巧
    关于redis bind
    query data filtered by a JSON Column in SQLAlchemy
    Flask多线程环境下logging
    Flask request
    [转] MySQL树结构递归查询处理
    [转]了解BFF架构
    转载:ELK实战系列3-RabbitMQ+ELK搭建日志平台
  • 原文地址:https://www.cnblogs.com/wufengtinghai/p/15800498.html
Copyright © 2020-2023  润新知