2019-12-10 01:13:14,305 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server log41/10.190.7.5:2181. Will not attempt to authenticate using SASL (unknown error) 2019-12-10 01:13:16,002 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 2002ms for sessionid 0x401a22fb929b8a7, closing socket connection and attempting reconnect 2019-12-10 01:13:16,103 ERROR org.apache.hadoop.ha.ActiveStandbyElector: Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. 2019-12-10 01:13:17,088 INFO org.apache.zookeeper.ZooKeeper: Session: 0x401a22fb929b8a7 closed 2019-12-10 01:13:17,088 ERROR org.apache.hadoop.ha.ZKFailoverController: Fatal error occurred:Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. 2019-12-10 01:13:17,088 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2019-12-10 01:13:17,091 ERROR org.apache.hadoop.ha.ZKFailoverController: The failover controller encounters runtime error: java.lang.RuntimeException: ZK Failover Controller failed: Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. at org.apache.hadoop.ha.ZKFailoverController.mainLoop(ZKFailoverController.java:381) at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:247) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:193) 2019-12-10 01:13:17,092 INFO org.apache.hadoop.ipc.Server: Stopping server on 8019 2019-12-10 01:13:17,093 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8019 2019-12-10 01:13:17,093 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election 2019-12-10 01:13:17,093 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2019-12-10 01:13:17,093 INFO org.apache.hadoop.ha.HealthMonitor: Stopping HealthMonitor thread 2019-12-10 01:13:17,093 FATAL org.apache.hadoop.hdfs.tools.DFSZKFailoverController: DFSZKFailOverController exiting due to earlier exception java.lang.RuntimeException: ZK Failover Controller failed: Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. 2019-12-10 01:13:17,096 INFO org.apache.hadoop.hdfs.tools.DFSZKFailoverController: SHUTDOWN_MSG:
namenode HA模式下,一个节点掉了,另外的节点不能正常被切换成Active,看zkfc的日志发现出现zookeeper连接超时的异常。
解决:
修改zookeeper的配置文件,在zoo.cfg中,修改ticktime为4000ms,默认是2000ms。