Keepalived 是一种高性能的服务器高可用或热备解决方案, Keepalived 可以用来防止服务器单点故障的发生,通过配合 Nginx 可以实现 web 前端服务的高可用。
Keepalived 以 VRRP 协议为实现基础,用 VRRP 协议来实现高可用性(HA)。 VRRP(Virtual RouterRedundancy Protocol)协议是用于实现路由器冗余的协议, VRRP 协议将两台或多台路由器设备虚拟成一个设备,对外提供虚拟路由器 IP(一个或多个),而在路由器组内部,如果实际拥有这个对外 IP 的路由器如果工作正常的话就是 MASTER,或者是通过算法选举产生, MASTER 实现针对虚拟路由器 IP 的各种网络功能,如 ARP 请求, ICMP,以及数据的转发等;其他设备不拥有该虚拟 IP,状态是 BACKUP,除了接收 MASTER 的VRRP 状态通告信息外,不执行对外的网络功能。当主机失效时, BACKUP 将接管原先 MASTER 的网络功能。VRRP 协议使用多播数据来传输 VRRP 数据, VRRP 数据使用特殊的虚拟源 MAC 地址发送数据而不是自身网卡的 MAC 地址, VRRP 运行时只有 MASTER 路由器定时发送 VRRP 通告信息,表示 MASTER 工作正常以及虚拟路由器 IP(组), BACKUP 只接收 VRRP 数据,不发送数据,如果一定时间内没有接收到 MASTER 的通告信息,各 BACKUP 将宣告自己成为 MASTER,发送通告信息,重新进行 MASTER 选举状态。
ip规划如下:定义VIP为:172.16.23.132
nginx1:172.16.23.129 keepalived:172.16.23.129
nginx2:172.16.23.130 keepalived:172.16.23.130
httpd1:172.16.23.128
httpd2:172.16.23.131
上面规划中nginx只提供负载均衡作用,并不实现web访问功能:
[root@master ~]# cat /etc/ansible/hosts|grep "^[nodes" -A 2 [nodes] 172.16.23.129 172.16.23.130
查看nginx服务状态:
[root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running Active: active (running) since 二 2018-12-18 16:33:04 CST; 12min ago Active: active (running) since 二 2018-12-18 16:35:51 CST; 10min ago
首先nginx服务正常开启,然后查看后端服务httpd:
[root@master ~]# cat /etc/ansible/hosts|grep "^[backend_nodes" -A 2 [backend_nodes] 172.16.23.128 172.16.23.131
查看httpd服务状态:
[root@master ~]# ansible backend_nodes -m shell -a "systemctl status httpd"|grep running Active: active (running) since 二 2018-12-18 16:29:36 CST; 22min ago Active: active (running) since 二 2018-12-18 16:30:03 CST; 21min ago
然后在nginx两台服务器上分别测试负载均衡效果:
[root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html" 172.16.23.129 | CHANGED | rc=0 >> 172.16.23.128 [root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html" 172.16.23.129 | CHANGED | rc=0 >> 172.16.23.131
由上面可以看出nginx1:172.16.23.129上进行测试返回后端httpd服务的web页面:172.16.23.128以及172.16.23.131,测试访问没有问题,负载均衡没有问题
[root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html" 172.16.23.130 | CHANGED | rc=0 >> 172.16.23.128 [root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html" 172.16.23.130 | CHANGED | rc=0 >> 172.16.23.131
由上面可以看见nginx2服务访问后端httpd服务也是完全OK的,于是nginx两台服务负载均衡效果达到,现在在nginx两台服务器上安装keepalived服务:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:06:38 CST; 52min ago Active: active (running) since 二 2018-12-18 16:05:04 CST; 54min ago
查看VIP信息:发现vip在node1节点上
[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130
可以看出VIP落在了nginx1也就是node1节点上,然后通过访问vip看看负载均衡效果:
[root@master ~]# curl http://172.16.23.132 172.16.23.131 [root@master ~]# curl http://172.16.23.132 172.16.23.128
由上面返回结果看,没有任何问题,现在摘掉一台nginx服务器,看看keepalived情况,以及访问vip的情况:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop nginx" 172.16.23.130 | CHANGED | rc=0 >>
查看keepalived服务状态,查看vip信息:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:05:04 CST; 1h 4min ago Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 3min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132
vip信息没有漂移,keepalived服务状态正常,现在访问vip:
[root@master ~]# curl http://172.16.23.132 172.16.23.128 [root@master ~]# curl http://172.16.23.132 172.16.23.131
通过vip访问web服务没有问题
现在将nginx服务开启,端掉一个节点的keepalived服务:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start nginx" 172.16.23.130 | CHANGED | rc=0 >> [root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running Active: active (running) since 二 2018-12-18 17:15:48 CST; 18s ago Active: active (running) since 二 2018-12-18 16:33:04 CST; 43min ago
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop keepalived" 172.16.23.130 | CHANGED | rc=0 >>
然后在该节点日志查看如下:tail -f /var/log/message
Dec 18 17:16:50 node2 systemd: Stopping LVS and VRRP High Availability Monitor... Dec 18 17:16:50 node2 Keepalived[12981]: Stopping Dec 18 17:16:50 node2 Keepalived_healthcheckers[12982]: Stopped Dec 18 17:16:51 node2 Keepalived_vrrp[12983]: Stopped Dec 18 17:16:51 node2 Keepalived[12981]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:16:52 node2 systemd: Stopped LVS and VRRP High Availability Monitor.
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 10min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132
由于断掉的是nginx2也就是node2节点的keepalived服务,所以vip还是在node1上,并没有漂移在node2,查看node1和node2节点上keepalived服务的配置文件:
[root@master ~]# ansible nodes -m shell -a "cat /etc/keepalived/keepalived.conf" 172.16.23.129 | CHANGED | rc=0 >> ! Configuration File for keepalived global_defs { notification_email { 346165580@qq.com } notification_email_from json_hc@163.com smtp_server smtp.163.com smtp_connect_timeout 30 router_id test } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 100 nopreempt # 非抢占模式 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.16.23.132/24 dev ens33 } } 172.16.23.130 | CHANGED | rc=0 >> ! Configuration File for keepalived global_defs { notification_email { 346165580@qq.com } notification_email_from json_hc@163.com smtp_server smtp.163.com smtp_connect_timeout 30 router_id test } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 99 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.16.23.132/24 dev ens33 } }
可以由配置看出,只有优先级不一样以及node1节点设置了nopreempt # 非抢占模式,现在将node2节点的keepalived服务开启,然后将node1节点的keepalived服务关掉,看看vip信息:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start keepalived" 172.16.23.130 | CHANGED | rc=0 >>
查看node2日志:
Dec 18 17:23:14 node2 systemd: Starting LVS and VRRP High Availability Monitor... Dec 18 17:23:14 node2 Keepalived[15994]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:23:14 node2 Keepalived[15994]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:14 node2 Keepalived[15995]: Starting Healthcheck child process, pid=15996 Dec 18 17:23:14 node2 Keepalived_healthcheckers[15996]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:14 node2 Keepalived[15995]: Starting VRRP child process, pid=15997 Dec 18 17:23:14 node2 systemd: Started LVS and VRRP High Availability Monitor. Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink reflector Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink command channel Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering gratuitous ARP shared channel Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) removing protocol VIPs. Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: Using LinkWatch kernel netlink reflector... Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Entering BACKUP STATE Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
两节点keepalived服务状态,以及vip信息:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 17:23:14 CST; 56s ago Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 17min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130
现在将node1的keepalived服务停掉看看vip信息:
[root@master ~]# ansible 172.16.23.129 -m shell -a "systemctl stop keepalived" 172.16.23.129 | CHANGED | rc=0 >>
查看各自节点的日志信息:
Dec 18 17:27:41 node1 systemd: Stopping LVS and VRRP High Availability Monitor... Dec 18 17:27:41 node1 Keepalived[24483]: Stopping Dec 18 17:27:41 node1 Keepalived_vrrp[24485]: VRRP_Instance(VI_1) sent 0 priority Dec 18 17:27:41 node1 Keepalived_vrrp[24485]: VRRP_Instance(VI_1) removing protocol VIPs. Dec 18 17:27:41 node1 Keepalived_healthcheckers[24484]: Stopped Dec 18 17:27:42 node1 Keepalived_vrrp[24485]: Stopped Dec 18 17:27:42 node1 Keepalived[24483]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:27:42 node1 systemd: Stopped LVS and VRRP High Availability Monitor.
Dec 18 17:27:42 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Transition to MASTER STATE Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Entering MASTER STATE Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) setting protocol VIPs. Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132 Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
可以看到vip漂移的信息切换,现在查看vip信息:
[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.132 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129
由上面信息,vip确认漂移到了node2节点,现在将node1节点的keepalived服务开启,看看vip是否会再次漂移回去到node1节点:
[root@master ~]# ansible 172.16.23.129 -m shell -a "systemctl start keepalived" 172.16.23.129 | CHANGED | rc=0 >>
查看node1日志:
Dec 18 17:30:18 node1 systemd: Starting LVS and VRRP High Availability Monitor... Dec 18 17:30:18 node1 Keepalived[28009]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:30:18 node1 Keepalived[28009]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:30:18 node1 Keepalived[28010]: Starting Healthcheck child process, pid=28011 Dec 18 17:30:18 node1 Keepalived_healthcheckers[28011]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:30:18 node1 Keepalived[28010]: Starting VRRP child process, pid=28012 Dec 18 17:30:18 node1 systemd: Started LVS and VRRP High Availability Monitor. Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering Kernel netlink reflector Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering Kernel netlink command channel Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering gratuitous ARP shared channel Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) removing protocol VIPs. Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: Using LinkWatch kernel netlink reflector... Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) Entering BACKUP STATE Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
查看node2日志:
Dec 18 17:30:01 node2 systemd: Started Session 1328 of user root. Dec 18 17:30:01 node2 systemd: Starting Session 1328 of user root. Dec 18 17:30:05 node2 systemd-logind: Removed session 1327. Dec 18 17:31:02 node2 systemd: Started Session 1329 of user root. Dec 18 17:31:02 node2 systemd: Starting Session 1329 of user root.
由node2日志信息显示vip并没有做漂移切换动作,现在查看vip:
[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.132
根据上面也可以验证到vip并没有漂移回来,这正好验证了nopreempt # 非抢占模式的功能
根据上面操作说明:
如果不希望keepalived服务再次上线而伴随vip再次漂移,可以设置nopreempt # 非抢占模式,具体配置信息参考上面的例子(只有优先级不同,外加上nopreempt # 非抢占模式)
现在vip在node2节点上,如果node2节点keepalived服务再次挂掉,看看vip是否会漂移:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop keepalived" 172.16.23.130 | CHANGED | rc=0 >>
查看node2日志:
Dec 18 17:35:59 node2 systemd: Stopping LVS and VRRP High Availability Monitor... Dec 18 17:35:59 node2 Keepalived[15995]: Stopping Dec 18 17:35:59 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) sent 0 priority Dec 18 17:35:59 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) removing protocol VIPs. Dec 18 17:35:59 node2 Keepalived_healthcheckers[15996]: Stopped Dec 18 17:36:00 node2 Keepalived_vrrp[15997]: Stopped Dec 18 17:36:00 node2 Keepalived[15995]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:36:00 node2 systemd: Stopped LVS and VRRP High Availability Monitor.
查看node1日志:
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132 Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132 Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132
可以看到vip又再次漂移到node1节点上了,这正是目前希望看到的
[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130
由上面测试得出:
当node1优先级高于node2节点,并且node1设置了nopreempt # 非抢占模式,那么当node1上面的keepalived服务挂掉并再次上线时,vip不会进行漂移回去,只有当node2上面的keepalived服务挂掉,vip才会再次漂移到node1节点
现在测试后端提供的httpd服务:
如果后端httpd服务挂掉一个,访问如下:
[root@master ~]# ansible 172.16.23.131 -m shell -a "systemctl stop httpd"
172.16.23.131
| CHANGED | rc=0 >> [root@master ~]# ansible 172.16.23.131 -m shell -a "systemctl status httpd" 172.16.23.131 | FAILED | rc=3 >>
根据vip访问如下:
[root@master ~]# curl http://172.16.23.132 172.16.23.128 [root@master ~]# curl http://172.16.23.132 172.16.23.128
访问没有任何问题,现在如果将172.16.23.131这台的httpd服务开启进行手动测试,并不提供给vip进行访问,当测试没问题后再进行为vip进行调用:
将两台nginx的配置如下进行修改:
upstream webserver { server 172.16.23.128 weight=1; server 172.16.23.131 weight=1; }
将server 172.16.23.131 weight=1;这一行进行摘掉,因为nginx两台,所以一台一台来处理,确保应用不会中断(在172.16.23.131上线之前操作)
由于两台nginx都只负载到了172.16.23.128上面,所以当172.16.23.131上线了也不会被调度到,如果要将172.16.23.131作为服务提供,那么再将nginx一台一台进行增加后端节点就行