• Elasticsearch学习笔记——分词


    1.测试Elasticsearch的分词

    Elasticsearch有多种分词器(参考:https://www.jianshu.com/p/d57935ba514b)

    Set the shape to semi-transparent by calling set_trans(5)

    (1)standard analyzer:标准分词器(默认是这种)
    set,the,shape,to,semi,transparent by,calling,set_trans,5

    (2)simple analyzer:简单分词器
    set, the, shape, to, semi, transparent, by, calling, set, trans

    (3)whitespace analyzer:空白分词器。大小写,下划线等都不会转换
    Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

    (4)language analyzer:(特定语言分词器,比如说English英语分瓷器)
    set, shape, semi, transpar, call, set_tran, 5

    2.为Elasticsearch的index设置分词

    这样就将这个index里面的所有type的分词设置成了simple

    PUT my_index
    {
    "settings": {
        "analysis": {
          "analyzer": {"default":{"type":"simple"}}
        }
      }
    }
    
    标准分词器 : standard analyzer
    http://localhost:9200/_analyze?analyzer=standard&pretty=true&text=test测试
    

    分词结果

    {
      "tokens" : [
        {
          "token" : "test",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "测",
          "start_offset" : 4,
          "end_offset" : 5,
          "type" : "<IDEOGRAPHIC>",
          "position" : 1
        },
        {
          "token" : "试",
          "start_offset" : 5,
          "end_offset" : 6,
          "type" : "<IDEOGRAPHIC>",
          "position" : 2
        }
      ]
    }
    

    简单分词器 : simple analyzer

    http://localhost:9200/_analyze?analyzer=simple&pretty=true&text=test_测试
    

     结果

    {
      "tokens" : [
        {
          "token" : "test",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "测试",
          "start_offset" : 5,
          "end_offset" : 7,
          "type" : "word",
          "position" : 1
        }
      ]
    }
    

    IK分词器 : ik_max_word analyzer ik_smart analyzer

    首先需要安装

    https://github.com/medcl/elasticsearch-analysis-ik
    

    下zip包,然后使用install plugin进行安装,我机器上的es版本是5.6.10,所以安装的就是5.6.10

    ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
    

    然后重新启动Elasticsearch就可以了

    进行测试

    http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=test_tes_te测试
    

    结果

    {
      "tokens" : [
        {
          "token" : "test_tes_te",
          "start_offset" : 0,
          "end_offset" : 11,
          "type" : "LETTER",
          "position" : 0
        },
        {
          "token" : "test",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "ENGLISH",
          "position" : 1
        },
        {
          "token" : "tes",
          "start_offset" : 5,
          "end_offset" : 8,
          "type" : "ENGLISH",
          "position" : 2
        },
        {
          "token" : "te",
          "start_offset" : 9,
          "end_offset" : 11,
          "type" : "ENGLISH",
          "position" : 3
        },
        {
          "token" : "测试",
          "start_offset" : 11,
          "end_offset" : 13,
          "type" : "CN_WORD",
          "position" : 4
        }
      ]
    }
    
  • 相关阅读:
    iOS开发App上传的三大步骤
    iOS开发关于AppStore程序的上传流程
    AFNetworking 3.0x版本最新特性
    iOS开发中两个不错的宏定义
    iOS开发中NSDate时间戳的转换--
    HDU 2844 Coins 多重背包
    poj 1888 Crossword Answers 模拟题
    杭电oj 1069 Monkey and Banana 最长递增子序列
    郑轻校赛题目 问题 G: 多少个0
    HDU 2571 命运
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/10088021.html
Copyright © 2020-2023  润新知