一、TCP keepalived
1. tcp-keepalive,顾名思义,它可以尽量让 TCP 连接“活着”,或者让一些对方无响应的 TCP 连接“宣告死亡”。
2. 一些特定环境,防火墙会自动断开长期无活动的 TCP 连接,tcp-keepalive 可以在连接无活动一段时间后,发送一个空 ack,使 TCP 连接不会被防火墙关闭。
3. 一些时候,对方的服务器可能出现宕机或者网络中断等问题, tcp-keepalive 可以帮助断开这些无响应的连接。
4. tcp-keepalive 需要在应用程序层面针对其所用到的 Socket 进行开启。操作系统层面无法强制所有 socket 启用 tcp-keepalive.
二、相关内核参数
1、tcp_keepalive_intvl
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4) The number of seconds between TCP keep-alive probes.
2、tcp_keepalive_probes
tcp_keepalive_probes (integer; default: 9; since Linux 2.2) The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end.
3、tcp_keepalive_time
(integer; default: 7200; since Linux 2.2) The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep- alives are sent only when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled.
在连接闲置 tcp_keepalive_time 秒后,发送探测包,如果对方回应ACK,便认为依然在线;
否则间隔 tcp_keepalive_intvl 秒后,持续发送探测包,一直到发送了 tcp_keepalive_probes 个探测包后,还未得到ACK回馈,便认为对方crash了。
三、查看tcp keepalive状态
# netstat -no|grep ESTABLISHED tcp 0 0 10.16.140.30:11100 10.16.140.16:37848 ESTABLISHED keepalive (12.19/0/0) tcp 0 0 10.16.140.30:11100 10.16.140.16:57178 ESTABLISHED keepalive (5.60/0/0)
四、启用
tcp-keepalive 需要在应用程序层面启动,如:python
"""开启keepalive""" s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) """设置每20秒发送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 20) """对方没有回应心跳包后,每隔一秒发送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
五、测试
客户端IP:10.30.20.90
服务端IP:10.30.20.125
1、服务端, 监听9999端口
# nc -l 9999
2、客户端
#!/usr/bin/python # -*- coding: UTF-8 -*- import time import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) """开启keepalive""" s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) """设置每20秒发送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 20) """对方没有回应心跳包后,每隔一秒发送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1) s.connect(('10.30.20.125', 9999)) time.sleep(200)
操作系统参数
# sysctl -a|grep keepalive net.ipv4.tcp_keepalive_intvl = 5 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_time = 10
3、模拟客户端
1)每隔20s发送一次keep-alive探测包
2)模拟故障
服务端
iptables -A INPUT -p tcp --dport 9999 -j DROP
iptables -A OUTPUT -p tcp --dport 9999 -j DROP
第一次探测包没有收到应答后,每隔1s发送一次探测包,连续发送5次后,发送rst标志给服务端,重置连接