如果一个swarm集群中,你有多个manager节点,比如3个,你的目的是什么?
那还用说吗,当然是一个manager挂掉之后,进行故障的转移了,但是你经历过这个转移吗?
如果没有,跟着下面的过程,模拟一次。
首先,在集群中有3个manager节点
[root@nccztsjb-node-01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION gxfkhuc95br6ltkhorpw1q4tq * nccztsjb-node-01 Ready Active Leader 20.10.17 8zjicf39fk28jn106symk1g5e nccztsjb-node-02 Ready Active 20.10.17 7d59usghrgq05k0yh4lbykw5v nccztsjb-node-04 Ready Active Reachable 20.10.17 wnd24l698iruhhp1xw0y3iyig nccztsjb-node-05 Ready Active Reachable 20.10.17
节点nccztsjb-node-01是管理节点,目前的角色是Leader.
其他两个manager节点,目前都是Reachable的状态。
接下来,关闭nccztsjb-node-01这个manager节点:
直接将docker引擎给关闭了:
[root@nccztsjb-node-01 ~]# systemctl stop docker Warning: Stopping docker.service, but it can still be activated by: docker.socket [root@nccztsjb-node-01 ~]# systemctl stop docker.socket [root@nccztsjb-node-01 ~]# docker node ls Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? [root@nccztsjb-node-01 ~]#
从其他的manager节点查看状态:
nccztsjb-node-04:
[root@nccztsjb-node-04 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION gxfkhuc95br6ltkhorpw1q4tq nccztsjb-node-01 Down Active Unreachable 20.10.17 8zjicf39fk28jn106symk1g5e nccztsjb-node-02 Ready Active 20.10.17 7d59usghrgq05k0yh4lbykw5v * nccztsjb-node-04 Ready Active Reachable 20.10.17 wnd24l698iruhhp1xw0y3iyig nccztsjb-node-05 Ready Active Leader 20.10.17 [root@nccztsjb-node-04 ~]#
发现,目前nccztsjb-node-01是Down的状态,并且是Unreachable的,更加重要的是,已经选举出来新的Leader nccztsjb-node-04
nccztsjb-node-05:
[root@nccztsjb-node-05 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION gxfkhuc95br6ltkhorpw1q4tq nccztsjb-node-01 Down Active Unreachable 20.10.17 8zjicf39fk28jn106symk1g5e nccztsjb-node-02 Ready Active 20.10.17 7d59usghrgq05k0yh4lbykw5v nccztsjb-node-04 Ready Active Reachable 20.10.17 wnd24l698iruhhp1xw0y3iyig * nccztsjb-node-05 Ready Active Leader 20.10.17 [root@nccztsjb-node-05 ~]#
目前,这个节点就是Leader节点。
通过上面的输出,你已经可以看到,轻松的实现了manager节点的故障转移,选取出来了新的Leader角色。
到这里完了吗?当然没有
如果节点恢复呢······
[root@nccztsjb-node-01 ~]# systemctl start docker [root@nccztsjb-node-01 ~]# [root@nccztsjb-node-01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION gxfkhuc95br6ltkhorpw1q4tq * nccztsjb-node-01 Ready Active Reachable 20.10.17 8zjicf39fk28jn106symk1g5e nccztsjb-node-02 Ready Active 20.10.17 7d59usghrgq05k0yh4lbykw5v nccztsjb-node-04 Ready Active Reachable 20.10.17 wnd24l698iruhhp1xw0y3iyig nccztsjb-node-05 Ready Active Leader 20.10.17 [root@nccztsjb-node-01 ~]#
可以看到,恢复之后,还是manager节点,但是状态是Reachable。没有恢复到Leader的角色。
OK,这个就是模拟了一个3个manager节点的故障、转移的过程。你懂了吗?