• Elasticsearch 安装中文分词


    github地址:https://github.com/medcl/elasticsearch-analysis-ik

       

    注意版本要对应,否则编译完成后elasticsearch不能正常启动

       

    下载文件,解压到E:softelkelasticsearch-analysis-ik-master目录下

    打开cmd,输入如下

    mvn package

       

    这个命令需要连网下载很多文件,等待一段时间后,复制

    E:softelkelasticsearch-analysis-ik-master arget eleases 下面的.zip文件到 ES目录/plugins/ 下面,在这个目录下创建ik文件夹,把elasticsearch-analysis-ik-{version}.zip 文件解压到ik下,目录结构如下:

       

    重新启动ES服务

       

    Tips:

    ik_max_word: 会将文本做最细粒度的拆分,比如会将"中华人民共和国国歌"拆分为"中华人民共和国,中华人民,中华,华人,人民共和国,人民,,,共和国,共和,,国国,国歌",会穷尽各种可能的组合;

    ik_smart: 会做最粗粒度的拆分,比如会将"中华人民共和国国歌"拆分为"中华人民共和国,国歌"

       

       

    测试中文: 如果没有索引新创建一个,结过测试ik和ik_max_word值是一样的

    测试url格式:http://localhost:9200/索引名称/_analyze?analyzer=ik&text=中华人民共和国国歌

    http://localhost:9200/_analyze?analyzer=ik&text=System.Xml.XmlReaderSettings.CreateReader

       

    浏览器中输入如下:

    http://localhost:9200/logstash-log4input-2016.04.26/_analyze?analyzer=ik&text=中华人民共和国国歌

       

    结果

    {"tokens":[{"token":"中华人民共和国","start_offset":0,"end_offset":7,"type":"CN_WORD","position":0},{"token":"中华人民","start_offset":0,"end_offset":4,"type":"CN_WORD","position":1},{"token":"中华","start_offset":0,"end_offset":2,"type":"CN_WORD","position":2},{"token":"华人","start_offset":1,"end_offset":3,"type":"CN_WORD","position":3},{"token":"人民共和国","start_offset":2,"end_offset":7,"type":"CN_WORD","position":4},{"token":"人民","start_offset":2,"end_offset":4,"type":"CN_WORD","position":5},{"token":"共和国","start_offset":4,"end_offset":7,"type":"CN_WORD","position":6},{"token":"共和","start_offset":4,"end_offset":6,"type":"CN_WORD","position":7},{"token":"国","start_offset":6,"end_offset":7,"type":"CN_CHAR","position":8},{"token":"国歌","start_offset":7,"end_offset":9,"type":"CN_WORD","position":9}]}

       

    浏览器中输入如下:

    http://localhost:9200/logstash-log4input-2016.04.26/_analyze?analyzer=ik&text=System.Xml.XmlReaderSettings.CreateReader

       

    结果:

    {"tokens":[{"token":"system.xml.xmlreadersettings.createreader","start_offset":0,"end_offset":41,"type":"LETTER","position":0},{"token":"system","start_offset":0,"end_offset":6,"type":"ENGLISH","position":1},{"token":"xml","start_offset":7,"end_offset":10,"type":"ENGLISH","position":2},{"token":"xmlreadersettings","start_offset":11,"end_offset":28,"type":"ENGLISH","position":3},{"token":"createreader","start_offset":29,"end_offset":41,"type":"ENGLISH","position":4}]}

       

       

    在实际项目中使用这个分词,es的字段类型生成后将不能修改,所以要在数据进入es之前设置mapping,采用索引模板的方式设置字段类型

    地址:http://localhost:9200/_template/

    名称:logstashlog4j

    Method:PUT

       

    设置所有索引名称为logstash-log4input-*的,message字段分词采用ik_max_word

       

    {

    "template": "logstash-log4input-*",

    "mappings": {

    "log4-input": {

    "properties": {

    "message": {

    "type": "string",

    "analyzer": "ik_max_word",

    "search_analyzer": "ik_max_word"

    }

    }

    }

    }

    }

       

       

    向elasticsearch中输入数据,测试查询如下:

       

       

       

       

       

       

  • 相关阅读:
    linux I2C 读写 tlv320dac3100
    ubuntu lfs
    安装和使用花生壳(linux)
    vim 配置
    vim
    gnome2 恢复默认 panel
    ubuntu 挂在 jffs2 文件
    gstreamer 播放
    gstreamer 环境变亮设置
    探讨【IGE】的源代码【五】。
  • 原文地址:https://www.cnblogs.com/liuyuhua/p/5711040.html
Copyright © 2020-2023  润新知