问题现象是telnet zk服务器地址不通,如下:
telnet 10.18.0.31 2181
Trying 10.18.0.31...
Connected to 10.18.0.31.
Escape character is '^]'.
Connection closed by foreign host.
从其他地址telnet zk服务器可通,初步判断是超过了zk服务器连接数导致
1. 查看zk服务器连接数配置
[root@hdfs-10-18-0-31 ~]# grep maxClient /data/zookeeper-3.4.14/conf/zoo.cfg
maxClientCnxns=2000
2. 查看服务器2181端口已有连接数
[root@hdfs-10-18-0-31 ~]# netstat -tan | grep 2181 | awk '{print $5}' | grep -E '([0-9]+.){3}[0-9]+' -o | sort | uniq -c
2000 10.18.0.27
2002 10.18.0.29
3. 查看k8s node上,是哪个pod建立的连接
[root@tbds-10-18-0-27 ~]# cat /proc/net/nf_conntrack | grep 2181 | awk '{print $7}'|sort|uniq -c
1996 src=192.168.237.213
4. 获取pod名称
kubectl -n xxx get pod -o wide | grep 192.168.237.213
xxx-pod-name 1/1 Running 0 15h 192.168.237.213 tbds-10-18-0-27
至此,终于找到了是哪个pod建立了这么链接