• ES三节点重启后报错no known master node


    问题

    一直在研究ES的监控怎么做,想偷点懒,不去通过API获取然后计算,就想找个现成的插件或者监控软件,只要装个agent就可以,然后就找到了x-pack,插件装好了之后,需要重启ES集群,线上的ES集群我想着既然是集群一台一台重启应该不会有问题的,太高估了,重启一台后,整个集群挂了......
     

    操作过程

    1、系统
    [centos@ip-172-0-0-233 bin]$ cat /etc/redhat-release 
    CentOS Linux release 7.6.1810 (Core) 
    2、ES版本
    [centos@ip-172-0-0-233 bin]$ ./elasticsearch --version
    Version: 5.0.2, Build: f6b4951/2016-11-24T10:07:18.101Z, JVM: 1.8.0_131

    3、杀进程

    ps -ef | grep pid
    kill -9 pid

    这样操作完就后悔了,不是每个服务都是这么杀的,不知道这步操作对集群挂了有没有一定的影响。

    4、报错信息

    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [lang-painless]
    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [percolator]
    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [reindex]
    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [transport-netty3]
    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [transport-netty4]
    [2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] no plugins loaded
    [2019-10-17T08:43:41,612][INFO ][o.e.n.Node               ] [node-1] initialized
    [2019-10-17T08:43:41,613][INFO ][o.e.n.Node               ] [node-1] starting ...
    [2019-10-17T08:43:41,812][INFO ][o.e.t.TransportService   ] [node-1] publish_address {172.0.0.16:9300}, bound_addresses {172.30.36.146:9300}
    [2019-10-17T08:43:41,817][INFO ][o.e.b.BootstrapCheck     ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
    
    [2019-10-17T08:44:11,833][WARN ][o.e.n.Node               ] [node-1] timed out while waiting for initial discovery state - timeout: 30s
    [2019-10-17T08:44:11,839][INFO ][o.e.h.HttpServer         ] [node-1] publish_address {172.0.0.16:9200}, bound_addresses {172.30.36.146:9200}
    [2019-10-17T08:44:11,839][INFO ][o.e.n.Node               ] [node-1] started
    [2019-10-17T08:44:12,001][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,001][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,003][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,010][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,010][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,228][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,758][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,759][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,760][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,814][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,814][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,815][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,815][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,820][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,820][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,821][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,822][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,822][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,823][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,824][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,826][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,827][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,827][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,828][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,828][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,830][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:12,830][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
    [2019-10-17T08:44:42,012][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
    [2019-10-17T08:44:42,012][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
    [2019-10-17T08:44:42,013][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
    org.elasticsearch.discovery.MasterNotDiscoveredException
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
    [2019-10-17T08:44:42,013][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
    org.elasticsearch.discovery.MasterNotDiscoveredException
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
    [2019-10-17T08:44:42,760][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
    [2019-10-17T08:44:42,761][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
    org.elasticsearch.discovery.MasterNotDiscoveredException
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
     
     
    5、配置文件
    cluster.name: lile
    node.name: node-1
    bootstrap.memory_lock: true
    network.host: 172.0.0.16
    http.port: 9200
    discovery.zen.ping.unicast.hosts: ["172.0.0.16","172.0.0.17","172.0.0.18"]
    discovery.zen.minimum_master_nodes: 2
    http.cors.enabled: true 
    http.cors.allow-origin: "*"
    path.data: /data/elasticsearch/data
    path.logs: /data/elasticsearch/logs

    三、解决办法

    各种重启都没有,在网上查到的,都是重启就好了,但是使劲的重启也没好。但是当discovery.zen.minimum_master_nodes这个值设置为1的时候,可以启动成功,但是三台都成了master了。后来看到有个这个参数,加上然后全部重启就好了。
     
     
    discovery.zen.ping_timeout: 60s

    四、分析原因

    还没细究,感觉是集群互相查找的时间太短了,没有找到对方,因为得2台才能形成集群
     
  • 相关阅读:
    ABP框架
    ABP框架
    VS2017调试器无法附加到IIS进程(w3wp.exe)
    c# 动态实例化一个泛型类
    在CentOS7.1上安装Gitlab碰到的问题及解决方法
    MongoDB
    在ABP模板工程中使用MySql
    增加VirtualBox虚拟机的磁盘空间大小(Host:Win7 VirtualBox5.0.16 VM:Win10)
    Spring中Bean及@Bean的理解
    @Bean 的用法
  • 原文地址:https://www.cnblogs.com/lemon-le/p/11707138.html
Copyright © 2020-2023  润新知