• elasticsearch-分词器(五)


    倒排索引原理

    可以查看这里得分词原理https://www.cnblogs.com/LQBlog/articles/5743991.html

    分析器处理过程的3步骤

    1.字符过滤器:去除字符的特殊字符

    2.分词器:将词组分词

    3.对分词词组进行操作,比如转大写 分词后的词组替换等

    ES内置的几种分析器结果

    例句:Set the shape to semi-transparent by calling set_trans(5)

    根据指定符号分词器

    {
      "settings": {
        "analysis": {
          "analyzer": {
              "comma": {
                     "type": "pattern",
                     "pattern":" " #分词符号
             }
          }
        }
      }
    }

    分词后结果

    Set,the,shape,to,semi-transparent,by,calling,set_trans(5)

    标准分析器

    适合英文 es默认的分词器

    根据单词边界分词 然后去掉特殊符号 最后转小写

    分词后结果

    set, the, shape, to, semi, transparent, by, calling, set_trans, 5

    简单分析器

    根据单词边界分词 非单词切割

    分词后结果

    set, the, shape, to, semi, transparent, by, calling, set, trans

    语言分析器

    特定语言分析器。自带一套字库

    测试分析器

    get请求:http://127.0.0.1:9200/_analyze

    body:

    {
        "analyzer":"standard",//分词器
        "text":"Set the shape to semi-transparent by calling set_trans(5)"//测试分词的fulltext
    }

    结果:

    {
        "tokens": [
            {
                "token": "set",//被索引的词
                "start_offset": 0,//原文本起始位置
                "end_offset": 3,//原文本结束位置
                "type": "<ALPHANUM>",
                "position": 0//第几个出现
            },
            {
                "token": "the",
                "start_offset": 4,
                "end_offset": 7,
                "type": "<ALPHANUM>",
                "position": 1
            },
            {
                "token": "shape",
                "start_offset": 8,
                "end_offset": 13,
                "type": "<ALPHANUM>",
                "position": 2
            },
            {
                "token": "to",
                "start_offset": 14,
                "end_offset": 16,
                "type": "<ALPHANUM>",
                "position": 3
            },
            {
                "token": "semi",
                "start_offset": 17,
                "end_offset": 21,
                "type": "<ALPHANUM>",
                "position": 4
            },
            {
                "token": "transparent",
                "start_offset": 22,
                "end_offset": 33,
                "type": "<ALPHANUM>",
                "position": 5
            },
            {
                "token": "by",
                "start_offset": 34,
                "end_offset": 36,
                "type": "<ALPHANUM>",
                "position": 6
            },
            {
                "token": "calling",
                "start_offset": 37,
                "end_offset": 44,
                "type": "<ALPHANUM>",
                "position": 7
            },
            {
                "token": "set_trans",
                "start_offset": 45,
                "end_offset": 54,
                "type": "<ALPHANUM>",
                "position": 8
            },
            {
                "token": "5",
                "start_offset": 55,
                "end_offset": 56,
                "type": "<NUM>",
                "position": 9
            }
        ]
    }

    查询某个文档分词结果 

    GET /${index}/${type}/${id}/_termvectors?fields=${fields_name}

    查询指定索引指定分词器分词结果

    GET /${index}/_analyze?analyzer={分词器名字}&text=2,3,4,5,100-100'

    添加分析器

    在已有索引新增分析器

    POST /{index}/_close #目标索引关闭,执行需要的更新操作 期间不能对索引进行操作
    
    PUT /{index}/_settings
    {
      "settings": {
        "analysis": {
          "analyzer": {
             "ik_word": {#要新增的分析器
                                "tokenizer": "ik_max_word"
               }
          }
        }
      }
    }
    
    POST /{index}/_open #索引打开
  • 相关阅读:
    maven 仓库配置 pom中repositories属性
    Spring Boot集成持久化Quartz定时任务管理和界面展示
    gradle使用总结
    sqlserver 分页
    MyBatis特殊字符转义
    Mybatis中#{}和${}传参的区别及#和$的区别小结
    Markdown 手册
    Spring boot——logback.xml 配置详解(四)<filter>
    Spring boot——logback.xml 配置详解(三)<appender>
    Spring boot——logback.xml 配置详解(二)
  • 原文地址:https://www.cnblogs.com/LQBlog/p/10430889.html
Copyright © 2020-2023  润新知