• 分布式搜索引擎Elasticsearch的简单使用


    官方网址:https://www.elastic.co/products/elasticsearch/

    一、特性

    1、支持中文分词

    2、支持多种数据源的全文检索引擎

    3、分布式

    4、基于lucene的开源搜索引擎

    5、Restful api

    二、资源

    • smartcn, 默认的中文分词 :https://github.com/elasticsearch/elasticsearch-analysis-smartcn

    • mmseg :https://github.com/medcl/elasticsearch-analysis-mmseg

    • ik:https://github.com/medcl/elasticsearch-analysis-ik

    • pinyin, 拼音分词可用于输入拼音提示中文 :https://github.com/medcl/elasticsearch-analysis-pinyin

    • stconvert, 中文简繁体互换 :https://github.com/medcl/elasticsearch-analysis-stconvert

    • elasticsearch-servicewrapper:https://github.com/elasticsearch/elasticsearch-servicewrapper

    • Elastic HQ,elasticsearch的监控工具:http://www.elastichq.org

    • elasticsearch-rtf :https://github.com/medcl/elasticsearch-rtf

    三、安装

    • 服务器:Linux(centos 6.x)

    • java环境:JDK 1.8.0

    • elasticsearch:2.3.1

    • elasticsearch-jdbc(数据源插件):2.3.1

    • IK Analysis(中文分词插件):1.9.1

    1、安装Java

    yum install java-1.8.0
    

    2、安装Elasticsearch

    #创建.repo文件(elasticsearch.repo)
    cat >> /etc/yum.repos.d/elasticsearch.repo << EOF
    [elasticsearch-2.x]
    name=Elasticsearch repository for 2.x packages
    baseurl=https://packages.elastic.co/elasticsearch/2.x/centos
    gpgcheck=1
    gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch
    enabled=1
    EOF
    
    #导入key:
    rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
    yum install elasticsearch
    

    3、创建用户

    如果是用root账号启动,会报以下错误

    Exception in thread "main" java.lang.RuntimeException: don't run elasticsearch as root. at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:93) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:144) at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:285) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35) Refer to the log for complete error details.

    这是出于系统安全考虑设置的条件。由于ElasticSearch可以接收用户输入的脚本并且执行,为了系统安全考虑, 建议创建一个单独的用户用来运行ElasticSearch

    groupadd elsearch
    useradd elsearch -g elsearch -p elasticsearch

    4、创建目录

    mkdir -p  /data/elasticsearch/data
    mkdir -p  /data/elasticsearch/logs
    chown -R elsearch:elsearch /data/elasticsearch/data
    chown -R elsearch:elsearch /data/elasticsearch/logs

    5、生成配置文件(/etc/elasticsearch/elasticsearch.yml)

    #集群名(同一个集群,名称必须相同)
    cluster.name: my-application
    #服务节点名(每个服务节点不一样)
    node.name: node-1
    #数据存储路径
    path.data: /data/elasticsearch/data
    #服务日志路径
    path.logs: /data/elasticsearch/logs
    #服务ip地址
    network.host: 0.0.0.0
    #服务端口
    http.port: 9200
    

    四、IK的安装

    1.安装maven工具

    wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
    yum install apache-maven

    2.下载ik源码包

    git clone https://github.com/medcl/elasticsearch-analysis-ik

    3.生成jar插件包

    mvn clean
    mvn compile
    mvn package
    
    unzip target/releases/elasticsearch-analysis-ik-*.zip
    cp -r target/releases/ /usr/share/elasticsearch/plugins/ik

    4.配置词库(ik自带搜狗词库)

    配置:/usr/share/elasticsearch/plugins/ik/config/ik/IKAnalyzer.cfg.xml

    <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic;custom/sougou.dic</entry>

    将jar包复制到Elasticsearch的plugins/analysis-ik 目录下,再把解压出的ik目录(配置和词典等),复制到Elasticsearch的config 目录下。然后编辑配置文件elasticsearch.yml ,在后面加一行:

    index.analysis.analyzer.ik.type : "ik"

    重启service elasticsearch restart

    然后录入数据,创建索引

    五、elasticsearch-jdbc

    1、使用feeder方式

    wget http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.1.0/elasticsearch-jdbc-2.3.1.0-dist.zip
    unzip elasticsearch-jdbc-2.3.1.0-dist.zip
    

    编辑数据导入脚本import.sh

    export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.1.0
    
    bin=$JDBC_IMPORTER_HOME/bin
    lib=$JDBC_IMPORTER_HOME/lib
    echo '{
    "type" : "jdbc",
    "jdbc": {
    "url":"jdbc:mysql://127.0.0.1:3306/dbtest",
    "user":"root",
    "password":"123456",
    "sql":"select * from test_tb",
    "index" : "customer",
    "type" : "external"
    }}' | java 
        -cp "${lib}/*" 
        -Dlog4j.configurationFile=${bin}/log4j2.xml 
        org.xbib.tools.Runner 
        org.xbib.tools.JDBCImporter
    

    测试

    curl -XGET 'localhost:9200/customer/external/_search?pretty&q=a'

    删除索引

    curl -XDELETE 'http://localhost:9200/customer'

    2、使用river方式

    #安装elasticsearch
    curl -OL https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.zip
    
    cd $ES_HOME
    unzip path/to/elasticsearch-1.4.2.zip
    
    #安装JDBC插件
    ./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.6/elasticsearch-river-jdbc-1.4.0.6-plugin.zip
    
    #下载mysql driver
    curl -o mysql-connector-java-5.1.33.zip -L 'http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.33.zip/from/http://cdn.mysql.com/'
    cp mysql-connector-java-5.1.33-bin.jar $ES_HOME/plugins/jdbc/ chmod 644 $ES_HOME/plugins/jdbc/*
    
    #启动elasticsearch
    ./bin/elasticsearch
    
    #停止river
    curl -XDELETE 'localhost:9200/_river/my_jdbc_river/'

    JDBC插件参数

    curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
        "type" : "jdbc",
        "jdbc" : {
            "url" : "jdbc:mysql://localhost:3306/test",
            "user" : "",
            "password" : "",
            "sql" : "select * from orders",
            "index" : "myindex",
            "type" : "mytype",
            ...
        }
    }'
    
    如果一个数组传递给jdbc字段,多个river源也是可以的
    
    curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
         <river parameters>
        "type" : "jdbc",
        "jdbc" : [ {
             <river definition 1>
        }, {
             <river definition 2>
        } ]
    }'
    
    curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
         "type" : "jdbc",
         "jdbc" : {
             "driver" : "com.mysql.jdbc.Driver",
             "url" : "jdbc:mysql://localhost:3306/test",
             "user" : "root",
             "password" : "123456",
             "sql" : "select * from test.student;",
             "interval" : "30",
             "index" : "test",
             "type" : "student"
         }
     }’

    查看ES是否已经同步了这些数据  

    curl -XGET 'localhost:9200/test/student/_search?pretty&q=*'
    

    官网地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

     

    参考

    https://www.elastic.co/guide/en/elasticsearch/guide/current/empty-search.html

    https://github.com/medcl/elasticsearch-analysis-ik

    http://blog.csdn.net/clementad/article/details/46898013

    https://endymecy.gitbooks.io/elasticsearch-guide-chinese/content/elasticsearch-river-jdbc.html

    https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

    https://github.com/jprante/elasticsearch-jdbc

    http://www.voidcn.com/blog/wojiushiwo987/article/p-6058574.html

    http://leotse90.com/2015/11/11/ElasticSearch%E4%B8%8EMySQL%E6%95%B0%E6%8D%AE%E5%90%8C%E6%AD%A5%E4%BB%A5%E5%8F%8A%E4%BF%AE%E6%94%B9%E8%A1%A8%E7%BB%93%E6%9E%84/

    http://www.jianshu.com/p/638ff7b848cc

    http://www.cnblogs.com/buzzlight/p/logstash_elasticsearch_kibana_log.html

  • 相关阅读:
    获取文件夹下的所有子文件,读取TXT文档
    360笔试
    刷题总结
    背包问题
    二叉树的创建、层次遍历、前序遍历、中序遍历、后序遍历
    今日头条面试
    面试题目
    Java高并发秒杀优化
    配置tomcat解压版
    环境变量设置:
  • 原文地址:https://www.cnblogs.com/chenpingzhao/p/5991579.html
Copyright © 2020-2023  润新知