问题现象:
使用hbase shell 连接报如下问题:
2019-10-09 10:37:18,855 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2019-10-09 10:37:18,856 WARN [main] zookeeper.ZKUtil: hconnection-0x6ef784bf0x0, quorum=xxx:2181,xxx:2181,xxx:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
hbase 日志里面报错日志如下:
2019-10-09 09:26:58,701 WARN [regionserver/xxx/192.168.1.8:16020-longCompactions-1569222224980-SendThread(xxx:2181)] zookeeper.ClientCnxn: Session 0x0 for server xxx/192.168.1.24:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
解决过程:
由上诉问题现象,可以发现是由于zookeeper的问题,先尝试看一下zookeeper是否挂掉。
1、使用telnet host or ip 2181 连接测试
# telnet xxx 2181
Trying 192.168.1.23...
Connected to xxx.
Escape character is '^]'.
Connection closed by foreign host.
发现连接不过去,远程服务器或者远程程序关闭了该连接
2、连接到zookeeper的节点服务器查看socket连接数
:~$ netstat -anl|grep 2181|grep -i '192.168.1.7'|grep ESTABLISHED|wc -l
1
:~$ netstat -anl|grep 2181|grep -i '192.168.1.8'|grep ESTABLISHED|wc -l
60
上诉的192.168.1.8这台机器就算hbase报错的服务器,可以发现这台机器在当前的zookeeper节点保持的会话是60个,这远远没有达到系统的限制
3、修改hbase的zookeeper连接限制
<property> <name>hbase.zookeeper.property.maxClientCnxns</name> <value>300</value> #默认是30,修改完以后,重启regioserver,但是没什么用
property>
4、修改zookeeper下的zoo.cfg文件
#maxClientCnxns=60 这个值跟刚才查看的ESTABLISHED连接数量刚好一致 取消掉注释,修改为150,重启zookeeper
问题解决