• elasticsearch5.6.8中文分词器


    安装分词器,务必确保版本一致!

    下载地址:https://github.com/medcl/elasticsearch-analysis-ik

    为了保证一致,我特地将elasticsearch进行降级。

    ik_smart

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 1
        }
      ]
    }
    

    ik_max_word

    GET _analyze?pretty
    {
      "analyzer": "ik_max_word",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "中华人民",
          "start_offset": 0,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 1
        },
        {
          "token": "中华",
          "start_offset": 0,
          "end_offset": 2,
          "type": "CN_WORD",
          "position": 2
        },
        {
          "token": "华人",
          "start_offset": 1,
          "end_offset": 3,
          "type": "CN_WORD",
          "position": 3
        },
        {
          "token": "人民共和国",
          "start_offset": 2,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 4
        },
        {
          "token": "人民",
          "start_offset": 2,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 5
        },
        {
          "token": "共和国",
          "start_offset": 4,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 6
        },
        {
          "token": "共和",
          "start_offset": 4,
          "end_offset": 6,
          "type": "CN_WORD",
          "position": 7
        },
        {
          "token": "国",
          "start_offset": 6,
          "end_offset": 7,
          "type": "CN_CHAR",
          "position": 8
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 9
        }
      ]
    }
    
  • 相关阅读:
    Vivado Non-Project Flow
    使用ngspice进行电路仿真
    Synopsys DC综合脚本示例
    解决Vivado XSDK在Ubuntu系统上自带UART Terminal Crash问题
    Ubuntu-18.04 LTS UEFI 安装U盘制作
    嵌入式处理器通过UART实现scanf和printf
    用于RISC-V的Makefile示例
    利用SSH隧道技术穿越内网访问远程设备
    C++基础-多态
    C++基础-继承
  • 原文地址:https://www.cnblogs.com/jiqing9006/p/9277055.html
Copyright © 2020-2023  润新知