• DockerFile构建ElasticSearch镜像安装IK中文分词器插件


    DockerFile构建ElasticSearch镜像安装IK中文分词器插件

    为什么要安装IK中文分词器?

    ES提供的分词是英文分词,对中文做分词时会拆成单字而不是词语,非常不友好,因此索引信息含中文时需要使用中文分词器插件。

    一、环境及文件准备

    环境准备
    • VMWare版本:15.5.5
    • 操作系统:CentOS7
    • Docker版本:19.03.12
    文件准备:
    • 拉取ElasticSearch镜像,版本:7.8.0
      docker pull elasticsearch:7.8.0
    • 下载中文分词器插件,版本:7.8.0
    # 在Linux根目录创建docker文件夹并进入文件夹
    mkdir /docker
    cd /docker
    # 下载IK插件文件(如果提示没有wget命令则先执行:`yum install -y wget`,再执行下载命令)
    wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.8.0/elasticsearch-analysis-ik-7.8.0.zip
    # 可选项:wget下载过慢可先用浏览器将文件下载到本地再上传到Linux(如果提示没有rz命令则先执行:`yum install -y lrzsz`,再执行上传命令,选择elasticsearch-analysis-ik-7.8.0.zip文件)
    rz
    # 解压(如果提示没有unzip命令则先执行:`yum install -y unzip`,再执行下载命令)
    unzip elasticsearch-analysis-ik-7.8.0.zip -d elasticsearch-analysis-ik
    

    注意:ElasticSearch镜像版本要与IK分词器一致(我使用elasticsearch:7.8.1镜像与elasticsearch-analysis-ik-7.8.0插件,构建镜像后无法使用)

    二、构建镜像并启动:

    1. 创建DockerFile:进入docker文件夹执行vi DockerFile
    FROM elasticsearch:7.8.0
    ADD elasticsearch-analysis-ik /usr/share/elasticsearch/plugins/elasticsearch-analysis-ik
    
    2. 创建镜像:在docker文件夹路径下执行docker build -f DockerFile -t elasticsearch-ik:7.8.0 .

    镜像构建成功:

    [root@localhost elasticsearch-ik]# docker build -f DockerFile -t elasticsearch-ik:7.8.0 .
    Sending build context to Docker daemon  14.39MB
    Step 1/2 : FROM elasticsearch:7.8.0
     ---> 121454ddad72
    Step 2/2 : ADD elasticsearch-analysis-ik /usr/share/elasticsearch/plugins/elasticsearch-analysis-ik
     ---> Using cache
     ---> 2af03d5426d3
    Successfully built 2af03d5426d3
    Successfully tagged elasticsearch-ik:7.8.0
    
    3. 创建并启动容器

    docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch_test elasticsearch-ik:7.8.0

    4. 验证ElasticSearch启动成功:curl localhost:9200

    显示如下即启动成功:

    [root@localhost docker]# curl localhost:9200
    {
      "name" : "9f832bbeb44a",
      "cluster_name" : "docker-cluster",
      "cluster_uuid" : "8GAjHyQEToO6PMl8dDoemQ",
      "version" : {
        "number" : "7.8.0",
        "build_flavor" : "default",
        "build_type" : "docker",
        "build_hash" : "757314695644ea9a1dc2fecd26d1a43856725e65",
        "build_date" : "2020-06-14T19:35:50.234439Z",
        "build_snapshot" : false,
        "lucene_version" : "8.5.1",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }
    

    三、测试分词器:

    这里使用的是postman
    请求url:http://192.168.0.199:9200/_analyze
    请求方式:post
    在请求体body中请求入参格式:

    {
        "analyzer": "chinese",
        "text": "今天是个好日子"
    }
    

    参数说明:
    analyzer:可填项有:chinese|ik_max_word|ik_smart,其中chinese是ES的默认分词器选项,ik_max_word(最细粒度划分)和ik_smart(最少划分)是ik中文分词器选项
    text:要进行分词操作的内容

    1. 测试使用默认分词器
    {
        "analyzer": "chinese",
        "text": "今天是个好日子"
    }
    

    结果:

    {
        "tokens": [
            {
                "token": "今",
                "start_offset": 0,
                "end_offset": 1,
                "type": "<IDEOGRAPHIC>",
                "position": 0
            },
            {
                "token": "天",
                "start_offset": 1,
                "end_offset": 2,
                "type": "<IDEOGRAPHIC>",
                "position": 1
            },
            {
                "token": "是",
                "start_offset": 2,
                "end_offset": 3,
                "type": "<IDEOGRAPHIC>",
                "position": 2
            },
            {
                "token": "个",
                "start_offset": 3,
                "end_offset": 4,
                "type": "<IDEOGRAPHIC>",
                "position": 3
            },
            {
                "token": "好",
                "start_offset": 4,
                "end_offset": 5,
                "type": "<IDEOGRAPHIC>",
                "position": 4
            },
            {
                "token": "日",
                "start_offset": 5,
                "end_offset": 6,
                "type": "<IDEOGRAPHIC>",
                "position": 5
            },
            {
                "token": "子",
                "start_offset": 6,
                "end_offset": 7,
                "type": "<IDEOGRAPHIC>",
                "position": 6
            }
        ]
    }
    
    2. 测试使用ik分词器ik_smart
    {
        "analyzer": "ik_smart",
        "text": "今天是个好日子"
    }
    

    结果:

    {
        "tokens": [
            {
                "token": "今天是",
                "start_offset": 0,
                "end_offset": 3,
                "type": "CN_WORD",
                "position": 0
            },
            {
                "token": "个",
                "start_offset": 3,
                "end_offset": 4,
                "type": "CN_CHAR",
                "position": 1
            },
            {
                "token": "好日子",
                "start_offset": 4,
                "end_offset": 7,
                "type": "CN_WORD",
                "position": 2
            }
        ]
    }
    
    3. 测试使用ik分词器ik_max_word
    {
        "analyzer": "ik_max_word",
        "text": "今天是个好日子"
    }
    

    结果:

    {
        "tokens": [
            {
                "token": "今天是",
                "start_offset": 0,
                "end_offset": 3,
                "type": "CN_WORD",
                "position": 0
            },
            {
                "token": "今天",
                "start_offset": 0,
                "end_offset": 2,
                "type": "CN_WORD",
                "position": 1
            },
            {
                "token": "是",
                "start_offset": 2,
                "end_offset": 3,
                "type": "CN_CHAR",
                "position": 2
            },
            {
                "token": "个",
                "start_offset": 3,
                "end_offset": 4,
                "type": "CN_CHAR",
                "position": 3
            },
            {
                "token": "好日子",
                "start_offset": 4,
                "end_offset": 7,
                "type": "CN_WORD",
                "position": 4
            },
            {
                "token": "日子",
                "start_offset": 5,
                "end_offset": 7,
                "type": "CN_WORD",
                "position": 5
            }
        ]
    }
    
  • 相关阅读:
    基于spring mvc的图片验证码实现
    spring mvc controller间跳转 重定向 传参
    fedora23安装配置记录
    Qt移动开发大部分的场景基本上实现没问题,listview支持刷新3000~5000的实时数据没有任何压力(QML的几个大型应用)
    经过了这么多年的发展,软件开发行业已经完全渗入了整个社会
    Qt云服务/云计算平台QTC(Qt Cloud Services)入门(0)
    Windows下用VC与QT编译MPI程序入门
    VS2008下QT整合OGRE
    表现层及ASP.NET MVC介绍(二)
    DDD分层架构的进化
  • 原文地址:https://www.cnblogs.com/new-life/p/13397982.html
Copyright © 2020-2023  润新知