Elasticsearch 集群
基于以下环境进行搭建:
CentOS Linux release 7.6.1810 (Core)
elasticsearch-7.6.2-x86_64.rpm
ip | 主机名 |
---|---|
192.168.1.1 | es1 |
192.168.1.2 | es2 |
192.168.1.3 | es3 |
安装
# 3台都操作
wget https://mirrors.huaweicloud.com/elasticsearch/7.6.2/elasticsearch-7.6.2-x86_64.rpm
rpm -ivh elasticsearch-7.6.2-x86_64.rpm
配置
目录统一在:/data/elasticsearch
/etc/elasticsearch/jvm.options
# 3台都操作
[root@es1 elasticsearch]# egrep -v "^#|^$" /etc/elasticsearch/jvm.options
-Xms16g
-Xmx16g
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/elasticsearch
-XX:ErrorFile=/data/elasticsearch/log/hs_err_pid%p.log
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/data/elasticsearch/log/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/data/elasticsearch/log/gc.log:utctime,pid,tags:filecount=32,filesize=64m
/etc/elasticsearch/elasticsearch.yml
[root@es1 elasticsearch]# egrep -v "^#|^$" /etc/elasticsearch/elasticsearch.yml
cluster.name: smy
# es2主机就改为es2
node.name: es1
path.data: /data/elasticsearch
path.logs: /data/elasticsearch/log
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["192.168.1.1", "192.168.1.2", "192.168.1.3"]
cluster.initial_master_nodes: ["es1", "es2", "es3"]
# es-head用的
http.cors.enabled: true
http.cors.allow-origin: "*"
启动
systemctl start elasticsearch.service
systemctl status elasticsearch.service
检测
# 主要关注 status 是不是 green
curl -X GET "127.0.0.1:9200/_cat/health?v"
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1589273020 08:43:40 smy green 3 3 769 769 0 0 0 0 - 100.0%
elasticsearch-head 安装
elasticsearch-head 是用于监控Elasticsearch 状态的客户端插件,包括数据可视化、执行增删改查操作等。
从 es 5.x 开始,不再做为 es 的插件运行,而是独立运行
- for Elasticsearch 5.x, 6.x, and 7.x: site plugins are not supported. Run as a standalone server
安装 node、 npm
# 安装 node
wget https://mirrors.huaweicloud.com/nodejs/latest-v10.x/node-v10.20.1-linux-x64.tar.gz
tar zxvf node-v10.20.1-linux-x64.tar.gz -C /usr/local/
mv node* node
cat <<'EOF'> /etc/profile.d/node.sh
export NODE_HOME=/usr/local/node
export PATH=$NODE_HOME/bin:$PATH
EOF
source /etc/profile
# 安装npm
npm install npm -g
运行 elasticsearch-head
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start &
[root@wlj174 elasticsearch-head]# npm start &
[1] 17292
[root@wlj174 elasticsearch-head]#
> elasticsearch-head@0.0.0 start /data/elasticsearch/elasticsearch-head
> grunt server
[root@wlj174 elasticsearch-head]# (node:17302) ExperimentalWarning: The http2 module is an experimental API.
Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:19100
Prometheus 监控 elasticsearch
使用的是 https://github.com/vvanholl/elasticsearch-prometheus-exporter
官方文档很详细了。。。
# 安装
./bin/elasticsearch-plugin install -b https://github.com/vvanholl/elasticsearch-prometheus-exporter/releases/download/7.6.2.0/prometheus-exporter-7.6.2.0.zip
# 重启 es
systemctl restart elasticsearch.service
访问 http://192.168.1.1:9200/_prometheus/metrics
prometheus.yaml
- job_name: elasticsearch
scrape_interval: 10s
metrics_path: "/_prometheus/metrics"
static_configs:
- targets:
- node1:9200
- node2:9200
- node3:9200
grafana
https://grafana.com/grafana/dashboards/266
遇到的坑
经常出现如下报错
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
使用 elasticsearch-head 依次连接 3 个节点,发现只有 wlj174 有这个问题,看错误日志,发现出现了一个 172.18.0.1 的地址,发现另外两台机器都没有 172.18 这个网段的地址,于是将 network.host: 0.0.0.0
改为监听本机的地址,重启 es 后问题解决。猜测是因为 0.0.0.0 是监听所有网卡的地址,而其他机器没有这个地址的时候,集群就连不上了。所以多网卡的服务器设置 0.0.0.0 的时候一定要注意