下载ES5.6.1:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.1.tar.gz
解压到当前文件夹:
tar -xzvf elasticsearch-5.6.1.tar.gz
修改sysctl文件:sudo vim /etc/sysctl.conf ,增加下面配置项:注意在每台机器上执行
增加改行配置:vm.max_map_count=655360
保存退出后,执行:
sudo sysctl -p
cd到/home/hadoop/elasticsearch-5.6.1/config目录,找到elasticsearch.yml文件
vim elasticsearch.yml
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 集群名
cluster.name: es-app
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 节点名
node.name: master
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
# 内存/这个跟系统有关的,如果系统底会出现版本太底的错误
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 绑定地址
network.host: 192.168.93.140
#
# Set a custom port for HTTP:
# http端口,外部通这个来请求数据;tcp:端口; 当在一台主机上配置多个节点时,这个一定要配置的。
http.port: 9200
transport.tcp.port: 9300
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这个节点的IP:port;默认的是这个端口,在一台机器配置多节点一定要加上port
discovery.zen.ping.unicast.hosts: ["192.168.93.140:9300", "192.168.93.141:9300","192.168.93.142:9300"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
# 防止脑裂
discovery.zen.minimum_master_nodes: 1
配置好之后,向其它节点复制过去就可以了,然后在各个节点把node.name与IP修改就可以了。
scp -r /home/hadoop/elasticsearch-5.6.1 hadoop@slaver1:~
scp -r /home/hadoop/elasticsearch-5.6.1 hadoop@slaver2:~
启动 cd 到cd到/home/hadoop/elasticsearch-5.6.1/bin下
./elasticsearch
注意每个节点都要启动
hadoop@master:~/elasticsearch-5.6.1/bin$ ./elasticsearch
[2017-09-24T19:02:08,979][INFO ][o.e.n.Node ] [master] initializing ...
[2017-09-24T19:02:09,245][INFO ][o.e.e.NodeEnvironment ] [master] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [21.5gb], net total_space [41.2gb], spins? [possibly], types [ext4]
[2017-09-24T19:02:09,246][INFO ][o.e.e.NodeEnvironment ] [master] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-09-24T19:02:09,248][INFO ][o.e.n.Node ] [master] node name [master], node ID [h1_nDt8nSiCPysC_YvCiCQ]
[2017-09-24T19:02:09,249][INFO ][o.e.n.Node ] [master] version[5.6.1], pid[81870], build[667b497/2017-09-14T19:22:05.189Z], OS[Linux/4.10.0-35-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-09-24T19:02:09,249][INFO ][o.e.n.Node ] [master] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/home/hadoop/elasticsearch-5.6.1]
[2017-09-24T19:02:11,242][INFO ][o.e.p.PluginsService ] [master] loaded module [aggs-matrix-stats]
[2017-09-24T19:02:11,242][INFO ][o.e.p.PluginsService ] [master] loaded module [ingest-common]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService ] [master] loaded module [lang-expression]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService ] [master] loaded module [lang-groovy]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService ] [master] loaded module [lang-mustache]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService ] [master] loaded module [lang-painless]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService ] [master] loaded module [parent-join]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService ] [master] loaded module [percolator]
[2017-09-24T19:02:11,245][INFO ][o.e.p.PluginsService ] [master] loaded module [reindex]
[2017-09-24T19:02:11,245][INFO ][o.e.p.PluginsService ] [master] loaded module [transport-netty3]
[2017-09-24T19:02:11,246][INFO ][o.e.p.PluginsService ] [master] loaded module [transport-netty4]
[2017-09-24T19:02:11,247][INFO ][o.e.p.PluginsService ] [master] no plugins loaded
[2017-09-24T19:02:14,304][INFO ][o.e.d.DiscoveryModule ] [master] using discovery type [zen]
[2017-09-24T19:02:15,462][INFO ][o.e.n.Node ] [master] initialized
[2017-09-24T19:02:15,463][INFO ][o.e.n.Node ] [master] starting ...
[2017-09-24T19:02:15,793][INFO ][o.e.t.TransportService ] [master] publish_address {192.168.93.140:9300}, bound_addresses {192.168.93.140:9300}
[2017-09-24T19:02:15,815][INFO ][o.e.b.BootstrapChecks ] [master] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-09-24T19:02:18,924][INFO ][o.e.c.s.ClusterService ] [master] new_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-09-24T19:02:18,971][INFO ][o.e.h.n.Netty4HttpServerTransport] [master] publish_address {192.168.93.140:9200}, bound_addresses {192.168.93.140:9200}
[2017-09-24T19:02:18,972][INFO ][o.e.n.Node ] [master] started
[2017-09-24T19:02:18,988][INFO ][o.e.g.GatewayService ] [master] recovered [0] indices into cluster_state
[2017-09-24T19:03:19,541][INFO ][o.e.c.s.ClusterService ] [master] added {{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300},}, reason: zen-disco-node-join[{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300}]
[2017-09-24T19:03:20,157][WARN ][o.e.d.z.ElectMasterService] [master] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])
[2017-09-24T19:05:25,236][INFO ][o.e.c.s.ClusterService ] [master] added {{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300},}, reason: zen-disco-node-join[{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300}]
我们查看slaver1和slaver2上的日志:
slaver1:
[2017-09-24T19:03:20,144][INFO ][o.e.c.s.ClusterService ] [slaver1] detected_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, added {{master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [3]])
[2017-09-24T19:03:20,187][INFO ][o.e.h.n.Netty4HttpServerTransport] [slaver1] publish_address {192.168.93.141:9200}, bound_addresses {192.168.93.141:9200}
[2017-09-24T19:03:20,188][INFO ][o.e.n.Node ] [slaver1] started
[2017-09-24T19:05:25,309][INFO ][o.e.c.s.ClusterService ] [slaver1] added {{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [4]])
slaver2:
[2017-09-24T19:05:25,706][INFO ][o.e.c.s.ClusterService ] [slaver2] detected_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, added {{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300},{master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [4]])
[2017-09-24T19:05:28,167][INFO ][o.e.h.n.Netty4HttpServerTransport] [slaver2] publish_address {192.168.93.142:9200}, bound_addresses {192.168.93.142:9200}
[2017-09-24T19:05:28,168][INFO ][o.e.n.Node ] [slaver2] started
通过以上日志可以看到各个节点相互发现了。
集群健康值:
hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/health?pretty=true或者在浏览器中输入http://192.168.93.140:9200/_cluster/health?pretty=true
{
"cluster_name" : "es-app",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
集群状态:
hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/state或者在浏览器中输入http://192.168.93.140:9200/_cluster/state
{"cluster_name":"es-app","version":4,"state_uuid":"nNfQrzNOSs6yrU7VNHNlwg","master_node":"h1_nDt8nSiCPysC_YvCiCQ","blocks":{},"nodes":{"_klsi3jPQP2hiPULnjsvyA":{"name":"slaver2","ephemeral_id":"Mu7pypT3R8CtuPh4ar9mLw","transport_address":"192.168.93.142:9300","attributes":{}},"xyb705aPSPq9iH-z8WHBpg":{"name":"slaver1","ephemeral_id":"C6yz5DQtSje3hHqmEwiPSw","transport_address":"192.168.93.141:9300","attributes":{}},"h1_nDt8nSiCPysC_YvCiCQ":{"name":"master","ephemeral_id":"efcDdMrKSmSObgecX79mEw","transport_address":"192.168.93.140:9300","attributes":{}}},"metadata":{"cluster_uuid":"o1DzbTt6RY-bgh4ilZ47Yw","templates":{},"indices":{},"index-graveyard":{"tombstones":[]}},"routing_table":{"indices":{}},"routing_nodes":{"unassigned":[],"nodes":{"_klsi3jPQP2hiPULnjsvyA":[],"xyb705aPSPq9iH-z8WHBpg":[],"h1_nDt8nSiCPysC_YvCiCQ":[]}}}
集群统计:
hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/stats
{"_nodes":{"total":3,"successful":3,"failed":0},"cluster_name":"es-app","timestamp":1506308656817,"status":"green","indices":{"count":0,"shards":{},"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":0,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-9223372036854775808,"file_sizes":{}}},"nodes":{"count":{"total":3,"data":3,"coordinating_only":0,"master":3,"ingest":3},"versions":["5.6.1"],"os":{"available_processors":48,"allocated_processors":48,"names":[{"name":"Linux","count":3}],"mem":{"total_in_bytes":25049698304,"free_in_bytes":1368895488,"used_in_bytes":23680802816,"free_percent":5,"used_percent":95}},"process":{"cpu":{"percent":0},"open_file_descriptors":{"min":446,"max":447,"avg":446}},"jvm":{"max_uptime_in_millis":3758298,"versions":[{"version":"1.8.0_131","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.131-b11","vm_vendor":"Oracle Corporation","count":3}],"mem":{"heap_used_in_bytes":1444532368,"heap_max_in_bytes":6227755008},"threads":229},"fs":{"total_in_bytes":132766040064,"free_in_bytes":77017890816,"available_in_bytes":70202990592,"spins":"true"},"plugins":[],"network_types":{"transport_types":{"netty4":3},"http_types":{"netty4":3}}}}
用python测试
sudo pip install elasticsearch
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch import Elasticsearch
from datetime import datetime
# 创建连接
es = Elasticsearch(hosts='192.168.93.140')
for i in range(1,100000):
es.index(index='els_student', doc_type='test-type', id=i, body={"name": "student" + str(i), "age": (i % 100), "timestamp": datetime.now()})
curl -XPOST '192.168.93.140:9200/els_student/_search?pretty' -d '
{
"query": { "match_all": {} }
}'
curl -XPOST '192.168.93.140:9200/els_student/_search?pretty' -d '
{
"query": { "match": { "name": "student41" } }
}'
curl -XPUT http://192.168.93.140:9200/index
curl -XPOST http://192.168.93.140:9200/index/fulltext/_mapping -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}'
增加ik分词器(注意找对应版本的,可以参考 https://github.com/medcl/elasticsearch-analysis-ik)
hadoop@master:~/elasticsearch-5.6.1$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.1/elasticsearch-analysis-ik-5.6.1.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.1/elasticsearch-analysis-ik-5.6.1.zip
[=================================================] 100%??
-> Installed analysis-ik
hadoop@master:~/elasticsearch-5.6.1$ cd bin/
安装完重启ES
curl -XPOST http://192.168.93.140:9200/index/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://192.168.93.140:9200/index/fulltext/2 -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://192.168.93.140:9200/index/fulltext/3 -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://192.168.93.140:9200/index/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
curl -XPOST http://192.168.93.140:9200/index/fulltext/_search -d'
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
'
结果:
{
"took": 169,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6099695,
"hits": [
{
"_index": "index",
"_type": "fulltext",
"_id": "4",
"_score": 0.6099695,
"_source": {
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
},
"highlight": {
"content": [
"<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
]
}
},
{
"_index": "index",
"_type": "fulltext",
"_id": "3",
"_score": 0.27179778,
"_source": {
"content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
},
"highlight": {
"content": [
"中韩渔警冲突调查:韩警平均每天扣1艘<tag1>中国</tag1>渔船"
]
}
}
]
}
}
Dictionary Configuration
IKAnalyzer.cfg.xml can be located at {conf}/analysis-ik/config/IKAnalyzer.cfg.xml or {plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>
热更新 IK 分词使用方法
目前该插件支持热更新 IK 分词,通过上文在 IK 配置文件中提到的如下配置
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">location</entry>
其中 location 是指一个 url,比如 http://yoursite.com/getCustomDict,该请求只需满足以下两点即可完成分词热更新。
该 http 请求需要返回两个头部(header),一个是 Last-Modified,一个是 ETag,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。
该 http 请求返回的内容格式是一行一个分词,换行符用
即可。
满足上面两点要求就可以实现热更新分词了,不需要重启 ES 实例。
可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里,放在 nginx 或其他简易 http server 下,当 .txt 文件修改时,http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇,并更新这个 .txt 文件。
have fun.
常见问题
1.自定义词典为什么没有生效?
请确保你的扩展词典的文本格式为 UTF8 编码
2.如何手动安装?
git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
git checkout tags/{version}
mvn clean
mvn compile
mvn package
拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik 重启elasticsearch
3.分词测试失败 请在某个索引下调用analyze接口测试,而不是直接调用analyze接口 如:http://localhost:9200/your_index/_analyze?text=中华人民共和国MN&tokenizer=my_ik
ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
curl -XPOST '192.168.93.140:9200/parsetext-index/_delete_by_query?pretty' -d '{
"query": {
"match_all": {}
}
}'