MySQL之高可用MHA

MHA工作原理

1. MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃的master保存二进制日志事件（binlog events）
2. 识别含有最新更新的slave
3. 应用差异的中继日志（relay log）到其他的slave
4. 应用从master保存的二进制日志事件（binlog events）
5. 提升一个slave为新的master
6. 使其他的slave连接新的master进行复制

将 i(1)--->i(2)--->i(x) 全部组成一个二进制日志

注意：为了尽可能的减少主库硬件损坏宕机造成的数据丢失，因此在配置MHA的同时建议配置成MySQL的半同步复制

案例：实现 MHA 实战案例

注意：CentOS8系统运行报错，不推荐使用

环境:四台主机
172.31.0.17 CentOS7 MHA管理端 
172.31.0.28 CentOS8 MySQL8.0 Master
172.31.0.38 CentOS8 MySQL8.0 Slave1
172.31.0.48 CentOS8 MySQL8.0 Slave2

在管理节点上安装两个包mha4mysql-manager和mha4mysql-node

说明:

mha4mysql-manager-0.56-0.el6.noarch.rpm 不支持CentOS 8，只支持CentOS7 以下版本
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 支持MySQL5.7和MySQL8.0 ,但和CentOS8
版本上的Mariadb -10.3.17不兼容
[root@centos8 ~]# ls
anaconda-ks.cfg  mha4mysql-manager-0.58-0.el7.centos.noarch.rpm  mha4mysql-node-0.58-0.el7.centos.noarch.rpm  original-ks.cfg

[root@centos8 ~]# yum install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm

在所有MySQL服务器上安装mha4mysql-node包,
此包支持CentOS 8，7，6

[root@sz-kx-centos8 ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm

在所有节点实现相互之间ssh key验证

MHA管理端
[root@centos8 ~]# yum install rsync -y
[05:47:52 root@centos8 ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:GA6eD2oTXm2a30Nq3oo0VjiEUfPy9YS/h9vIjfnkdFo root@centos8.longxuan.vip
The key's randomart image is:
+---[RSA 3072]----+
|  ..o            |
|   o o   .       |
|  . + o o .      |
|   o O + +       |
|  . B B S o      |
| . + O  .  o     |
|  = * .o  o + E  |
| . + +oo.. % +   |
|    .o+.o.*.*    |
+----[SHA256]-----+
[05:48:02 root@centos8 ~]# ssh-copy-id 127.0.0.1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:UxQsAjgLsmA4tpc7HO0xU9txsXgxqhyba9KbywIvZTA.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@127.0.0.1's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '127.0.0.1'"
and check to make sure that only the key(s) you wanted were added.

[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.18:/root/
[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.38:/root/
[05:51:01 root@centos8 ~]# rsync -av .ssh 172.31.0.48:/root/

在管理节点建立配置文件

注意: 此文件的行尾不要加空格等符号

[root@centos8 ~]# mkdir /etc/mastermha/

[root@centos8 ~]# vim /etc/mastermha/app1.cnf
[server default]
user=mhauser
password=centos
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=123456
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
report_script=/usr/local/bin/sendmail.sh
check_repl_delay=0
master_binlog_dir=/data/mysql/
[server1]
hostname=172.31.0.28
candidate_master=1
[server2]
hostname=172.31.0.38
candidate_master=1
[server3]
hostname=172.31.0.48

说明: 主库宕机谁来接管新的master

1. 所有从节点日志都是一致的，默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致，自动选择最接近于主库的从库充当新主
3. 如果对于某节点设定了权重（candidate_master=1），权重节点会优先选择。但是此节点日志量落后主库超过100M日志的话，也不会被选择。可以配合check_repl_delay=0，关闭日志量的检查，强制选择候选节点

实现Master

[root@sz-kx-centos8 ~]# yum install mysql-server -y

[root@sz-kx-centos8 ~]# mkdir /data/mysql/
[root@sz-kx-centos8 ~]# chown mysql.mysql /data/mysql/

[root@sz-kx-centos8 ~]# vim /etc/my.cnf
[mysqld]
server-id=28
log-bin=/data/mysql/mysql-bin
skip-name-resolve=1
general-log

[root@sz-kx-centos8 ~]# systemctl restart mysqld
# 查询二进制日志位置
mysql> show master logs;

# 创建主从复制用户并授权
mysql> create user repluser@'172.31.0.%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)

mysql> grant replication slave on *.* to repluser@'172.31.0.%';
Query OK, 0 rows affected (0.01 sec)

# 创建mha用户并授权
mysql> create user mhauser@'172.31.0.%' identified by 'centos';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on *.* to mhauser@'172.31.0.%';
Query OK, 0 rows affected (0.01 sec)

# 使用标签做个VIP地址
[root@sz-kx-centos8 ~]# ifconfig eth0:1 172.31.0.100/16
[root@sz-kx-centos8 ~]# ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.0.18  netmask 255.255.0.0  broadcast 172.31.255.255
        inet6 fe80::20c:29ff:fe43:49b  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:43:04:9b  txqueuelen 1000  (Ethernet)
        RX packets 42588  bytes 55076155 (52.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20092  bytes 1443183 (1.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.0.100  netmask 255.255.0.0  broadcast 172.31.255.255
        ether 00:0c:29:43:04:9b  txqueuelen 1000  (Ethernet)

实现两台slave

[root@centos8 ~]# yum install mysql-server -y
[root@centos8 ~]# mkdir /data/mysql -p
[root@centos8 ~]# chown mysql.mysql /data/mysql/

[root@centos8 ~]# vim /etc/my.cnf
[mysqld]
server-id=48
log-bin=/data/mysql/mysql-bin
read-only
relay_log_purge=0
skip_name_resolve=1
general_log

[root@centos8 ~]# systemctl start mysqld

# 添加主的二进制日志，注意：如果之后重新添加不能添加之前的，只能添加当前的
CHANGE MASTER TO
  MASTER_HOST='172.31.0.28',
  MASTER_USER='repluser',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='mysql-bin.000002',
  MASTER_LOG_POS=156;

mysql> start slave;
Query OK, 0 rows affected (0.05 sec)

检查MHA的环境

# ssh互信检测
[root@centos8 ~]# masterha_check_ssh --conf=/etc/mastermha/app1.cnf 
Sat May 22 06:32:13 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 06:32:13 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 06:32:13 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 06:32:13 2021 - [info] Starting SSH connection tests..
Sat May 22 06:32:14 2021 - [debug] 
Sat May 22 06:32:13 2021 - [debug]  Connecting via SSH from root@172.31.0.18(172.31.0.18:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 06:32:13 2021 - [debug]   ok.
Sat May 22 06:32:13 2021 - [debug]  Connecting via SSH from root@172.31.0.18(172.31.0.18:22) to root@172.31.0.48(172.31.0.48:22)..
Warning: Permanently added '172.31.0.48' (ECDSA) to the list of known hosts.
Sat May 22 06:32:14 2021 - [debug]   ok.
Sat May 22 06:32:14 2021 - [debug] 
Sat May 22 06:32:13 2021 - [debug]  Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.18(172.31.0.18:22)..
Sat May 22 06:32:14 2021 - [debug]   ok.
Sat May 22 06:32:14 2021 - [debug]  Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 06:32:14 2021 - [debug]   ok.
Sat May 22 06:32:15 2021 - [debug] 
Sat May 22 06:32:14 2021 - [debug]  Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.18(172.31.0.18:22)..
Sat May 22 06:32:14 2021 - [debug]   ok.
Sat May 22 06:32:14 2021 - [debug]  Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 06:32:15 2021 - [debug]   ok.
Sat May 22 06:32:15 2021 - [info] All SSH connection tests passed successfully.
Use of uninitialized value in exit at /usr/bin/masterha_check_ssh line 44.

# 主从复制检测
[root@centos8 ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf 
Sat May 22 18:00:02 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:00:02 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:02 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:02 2021 - [info] Starting SSH connection tests..
Sat May 22 18:00:03 2021 - [debug] 
Sat May 22 18:00:02 2021 - [debug]  Connecting via SSH from root@172.31.0.28(172.31.0.28:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 18:00:02 2021 - [debug]   ok.
Sat May 22 18:00:02 2021 - [debug]  Connecting via SSH from root@172.31.0.28(172.31.0.28:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 18:00:02 2021 - [debug]   ok.
Sat May 22 18:00:03 2021 - [debug] 
Sat May 22 18:00:02 2021 - [debug]  Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.28(172.31.0.28:22)..
Sat May 22 18:00:02 2021 - [debug]   ok.
Sat May 22 18:00:02 2021 - [debug]  Connecting via SSH from root@172.31.0.48(172.31.0.48:22) to root@172.31.0.38(172.31.0.38:22)..
Sat May 22 18:00:03 2021 - [debug]   ok.
Sat May 22 18:00:04 2021 - [debug] 
Sat May 22 18:00:03 2021 - [debug]  Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.28(172.31.0.28:22)..
Sat May 22 18:00:03 2021 - [debug]   ok.
Sat May 22 18:00:03 2021 - [debug]  Connecting via SSH from root@172.31.0.38(172.31.0.38:22) to root@172.31.0.48(172.31.0.48:22)..
Sat May 22 18:00:03 2021 - [debug]   ok.
Sat May 22 18:00:04 2021 - [info] All SSH connection tests passed successfully.
[root@localhost ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf 
Sat May 22 18:00:08 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:00:08 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:08 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:00:08 2021 - [info] MHA::MasterMonitor version 0.58.
Sat May 22 18:00:09 2021 - [info] GTID failover mode = 0
Sat May 22 18:00:09 2021 - [info] Dead Servers:
Sat May 22 18:00:09 2021 - [info] Alive Servers:
Sat May 22 18:00:09 2021 - [info]   172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info]   172.31.0.48(172.31.0.48:3306)
Sat May 22 18:00:09 2021 - [info]   172.31.0.38(172.31.0.38:3306)
Sat May 22 18:00:09 2021 - [info] Alive Slaves:
Sat May 22 18:00:09 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:00:09 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:00:09 2021 - [info]   172.31.0.38(172.31.0.38:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:00:09 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] Current Alive Master: 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:00:09 2021 - [info] Checking slave configurations..
Sat May 22 18:00:09 2021 - [info] Checking replication filtering settings..
Sat May 22 18:00:09 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat May 22 18:00:09 2021 - [info]  Replication filtering check ok.
Sat May 22 18:00:09 2021 - [info] GTID (with auto-pos) is not supported
Sat May 22 18:00:09 2021 - [info] Starting SSH connection tests..
Sat May 22 18:00:12 2021 - [info] All SSH connection tests passed successfully.
Sat May 22 18:00:12 2021 - [info] Checking MHA Node version..
Sat May 22 18:00:12 2021 - [info]  Version check ok.
Sat May 22 18:00:12 2021 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 22 18:00:13 2021 - [info] HealthCheck: SSH to 172.31.0.28 is reachable.
Sat May 22 18:00:13 2021 - [info] Master MHA Node version is 0.58.
Sat May 22 18:00:13 2021 - [info] Checking recovery script configurations on 172.31.0.28(172.31.0.28:3306)..
Sat May 22 18:00:13 2021 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000002 
Sat May 22 18:00:13 2021 - [info]   Connecting to root@172.31.0.28(172.31.0.28:22).. 
  Creating /data/mastermha/app1 if not exists.. Creating directory /data/mastermha/app1.. done.
   ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/, up to mysql-bin.000002
Sat May 22 18:00:13 2021 - [info] Binlog setting check done.
Sat May 22 18:00:13 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat May 22 18:00:13 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.31.0.48 --slave_ip=172.31.0.48 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=centos8-relay-bin.000002  --slave_pass=xxx
Sat May 22 18:00:13 2021 - [info]   Connecting to root@172.31.0.48(172.31.0.48:22).. 
Creating directory /data/mastermha/app1/.. done.
  Checking slave recovery environment settings..
    Relay log found at /var/lib/mysql, up to centos8-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/centos8-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat May 22 18:00:13 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.31.0.38 --slave_ip=172.31.0.38 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=centos8-relay-bin.000002  --slave_pass=xxx
Sat May 22 18:00:13 2021 - [info]   Connecting to root@172.31.0.38(172.31.0.38:22).. 
Creating directory /data/mastermha/app1/.. done.
  Checking slave recovery environment settings..
    Relay log found at /var/lib/mysql, up to centos8-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/centos8-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat May 22 18:00:14 2021 - [info] Slaves settings check done.
Sat May 22 18:00:14 2021 - [info] 
172.31.0.28(172.31.0.28:3306) (current master)
 +--172.31.0.48(172.31.0.48:3306)
 +--172.31.0.38(172.31.0.38:3306)

Sat May 22 18:00:14 2021 - [info] Checking replication health on 172.31.0.48..
Sat May 22 18:00:14 2021 - [info]  ok.
Sat May 22 18:00:14 2021 - [info] Checking replication health on 172.31.0.38..
Sat May 22 18:00:14 2021 - [info]  ok.
Sat May 22 18:00:14 2021 - [info] Checking master_ip_failover_script status:
Sat May 22 18:00:14 2021 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===

Checking the Status of the script.. OK 
Sat May 22 18:00:14 2021 - [info]  OK.
Sat May 22 18:00:14 2021 - [warning] shutdown_script is not defined.
Sat May 22 18:00:14 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

# 查看状态
[root@centos8 ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf 
app1 is stopped(2:NOT_RUNNING).

# 启动
[root@centos8 ~]# masterha_manager --conf=/etc/mastermha/app1.cnf &> /dev/null

# master查看到健康性检查
[root@sz-kx-centos8 ~]# tail -f /var/lib/mysql/centos8.log
2021-05-22T18:05:00.408005Z	   24 Query	SELECT 1 As Value
2021-05-22T18:05:01.408492Z	   24 Query	SELECT 1 As Value
2021-05-22T18:05:02.409002Z	   24 Query	SELECT 1 As Value
2021-05-22T18:05:03.409469Z	   24 Query	SELECT 1 As Value
2021-05-22T18:05:04.410620Z	   24 Query	SELECT 1 As Value
2021-05-22T18:05:05.411095Z	   24 Query	SELECT 1 As Value

# 查看状态
[root@localhost ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 (pid:27237) is running(0:PING_OK), master:172.31.0.28

模拟故障

# 当 master down机后，mha管理程序自动退出

# 追踪日志
[root@localhost ~]# tail /data/mastermha/app1/manager.log -f
Sat May 22 18:08:32 2021 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Sat May 22 18:08:32 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Sat May 22 18:08:32 2021 - [info] HealthCheck: SSH to 172.31.0.28 is reachable.
Sat May 22 18:08:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:33 2021 - [warning] Connection failed 2 time(s)..
Sat May 22 18:08:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:34 2021 - [warning] Connection failed 3 time(s)..
Sat May 22 18:08:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.0.28' (111))
Sat May 22 18:08:35 2021 - [warning] Connection failed 4 time(s)..
Sat May 22 18:08:35 2021 - [warning] Master is not reachable from health checker!
Sat May 22 18:08:35 2021 - [warning] Master 172.31.0.28(172.31.0.28:3306) is not reachable!
Sat May 22 18:08:35 2021 - [warning] SSH is reachable.
Sat May 22 18:08:35 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mastermha/app1.cnf again, and trying to connect to all servers to check server status..
Sat May 22 18:08:35 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 18:08:35 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:08:35 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 18:08:36 2021 - [info] GTID failover mode = 0
Sat May 22 18:08:36 2021 - [info] Dead Servers:
Sat May 22 18:08:36 2021 - [info]   172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info] Alive Servers:
Sat May 22 18:08:36 2021 - [info]   172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:36 2021 - [info]   172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:36 2021 - [info] Alive Slaves:
Sat May 22 18:08:36 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:36 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:36 2021 - [info]   172.31.0.38(172.31.0.38:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:36 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:36 2021 - [info] Checking slave configurations..
Sat May 22 18:08:36 2021 - [info] Checking replication filtering settings..
Sat May 22 18:08:36 2021 - [info]  Replication filtering check ok.
Sat May 22 18:08:36 2021 - [info] Master is down!
Sat May 22 18:08:36 2021 - [info] Terminating monitoring script.
Sat May 22 18:08:36 2021 - [info] Got exit code 20 (Master dead).
Sat May 22 18:08:36 2021 - [info] MHA::MasterFailover version 0.58.
Sat May 22 18:08:36 2021 - [info] Starting master failover.
Sat May 22 18:08:36 2021 - [info] 
Sat May 22 18:08:36 2021 - [info] * Phase 1: Configuration Check Phase..
Sat May 22 18:08:36 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] GTID failover mode = 0
Sat May 22 18:08:37 2021 - [info] Dead Servers:
Sat May 22 18:08:37 2021 - [info]   172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Checking master reachability via MySQL(double check)...
Sat May 22 18:08:37 2021 - [info]  ok.
Sat May 22 18:08:37 2021 - [info] Alive Servers:
Sat May 22 18:08:37 2021 - [info]   172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:37 2021 - [info]   172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:37 2021 - [info] Alive Slaves:
Sat May 22 18:08:37 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info]   172.31.0.38(172.31.0.38:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] Starting Non-GTID based failover.
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] ** Phase 1: Configuration Check Phase completed.
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] Forcing shutdown so that applications never connect to the current master..
Sat May 22 18:08:37 2021 - [info] Executing master IP deactivation script:
Sat May 22 18:08:37 2021 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306 --command=stopssh --ssh_user=root  

IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===

Disabling the VIP on old master: 172.31.0.28 
Sat May 22 18:08:37 2021 - [info]  done.
Sat May 22 18:08:37 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat May 22 18:08:37 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] * Phase 3: Master Recovery Phase..
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:1391
Sat May 22 18:08:37 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat May 22 18:08:37 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info]   172.31.0.38(172.31.0.38:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:1391
Sat May 22 18:08:37 2021 - [info] Oldest slaves:
Sat May 22 18:08:37 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:37 2021 - [info]   172.31.0.38(172.31.0.38:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:37 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sat May 22 18:08:37 2021 - [info] 
Sat May 22 18:08:37 2021 - [info] Fetching dead master's binary logs..
Sat May 22 18:08:37 2021 - [info] Executing command on the dead master 172.31.0.28(172.31.0.28:3306): save_binary_logs --command=save --start_file=mysql-bin.000002  --start_pos=1391 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
  Creating /data/mastermha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000002 pos 1391 to mysql-bin.000002 EOF into /data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 156.. ok.
  No need to dump effective binlog data from /data/mysql//mysql-bin.000002 (pos starts 1391, filesize 1391). Skipping.
 Binlog Checksum enabled
 /data/mastermha/app1//saved_master_binlog_from_172.31.0.28_3306_20210522180836.binlog has no effective data events.
Event not exists.
Sat May 22 18:08:38 2021 - [info] Additional events were not found from the orig master. No need to save.
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] * Phase 3.3: Determining New Master Phase..
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sat May 22 18:08:38 2021 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sat May 22 18:08:38 2021 - [info] Searching new master from slaves..
Sat May 22 18:08:38 2021 - [info]  Candidate masters from the configuration file:
Sat May 22 18:08:38 2021 - [info]   172.31.0.48(172.31.0.48:3306)  Version=8.0.21 (oldest major version between slaves) log-bin:enabled
Sat May 22 18:08:38 2021 - [info]     Replicating from 172.31.0.28(172.31.0.28:3306)
Sat May 22 18:08:38 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat May 22 18:08:38 2021 - [info]  Non-candidate masters:
Sat May 22 18:08:38 2021 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat May 22 18:08:38 2021 - [info] New master is 172.31.0.48(172.31.0.48:3306)
Sat May 22 18:08:38 2021 - [info] Starting master failover..
Sat May 22 18:08:38 2021 - [info] 
From:
172.31.0.28(172.31.0.28:3306) (current master)
 +--172.31.0.48(172.31.0.48:3306)
 +--172.31.0.38(172.31.0.38:3306)

To:
172.31.0.48(172.31.0.48:3306) (new master)
 +--172.31.0.38(172.31.0.38:3306)
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] * Phase 3.5: Master Log Apply Phase..
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sat May 22 18:08:38 2021 - [info] Starting recovery on 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:38 2021 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Sat May 22 18:08:38 2021 - [info]   done.
Sat May 22 18:08:38 2021 - [info]  All relay logs were successfully applied.
Sat May 22 18:08:38 2021 - [info] Getting new master's binlog name and position..
Sat May 22 18:08:38 2021 - [info]  mysql-bin.000002:1426
Sat May 22 18:08:38 2021 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.0.48', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=1426, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Sat May 22 18:08:38 2021 - [info] Executing master IP activate script:
Sat May 22 18:08:38 2021 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.31.0.28 --orig_master_ip=172.31.0.28 --orig_master_port=3306 --new_master_host=172.31.0.48 --new_master_ip=172.31.0.48 --new_master_port=3306 --new_master_user='mhauser'   --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password

IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.31.0.100/16;/sbin/arping -I eth0 -c 3 -s 172.31.0.100/16 172.31.0.254 >/dev/null 2>&1===

Enabling the VIP - 172.31.0.100/16 on the new master - 172.31.0.48 
Sat May 22 18:08:38 2021 - [info]  OK.
Sat May 22 18:08:38 2021 - [info] Setting read_only=0 on 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:38 2021 - [info]  ok.
Sat May 22 18:08:38 2021 - [info] ** Finished master recovery successfully.
Sat May 22 18:08:38 2021 - [info] * Phase 3: Master Recovery Phase completed.
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] * Phase 4: Slaves Recovery Phase..
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sat May 22 18:08:38 2021 - [info] 
Sat May 22 18:08:38 2021 - [info] -- Slave diff file generation on host 172.31.0.38(172.31.0.38:3306) started, pid: 27571. Check tmp log /data/mastermha/app1//172.31.0.38_3306_20210522180836.log if it takes time..
Sat May 22 18:08:39 2021 - [info] 
Sat May 22 18:08:39 2021 - [info] Log messages from 172.31.0.38 ...
Sat May 22 18:08:39 2021 - [info] 
Sat May 22 18:08:38 2021 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sat May 22 18:08:39 2021 - [info] End of log messages from 172.31.0.38.
Sat May 22 18:08:39 2021 - [info] -- 172.31.0.38(172.31.0.38:3306) has the latest relay log events.
Sat May 22 18:08:39 2021 - [info] Generating relay diff files from the latest slave succeeded.
Sat May 22 18:08:39 2021 - [info] 
Sat May 22 18:08:39 2021 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sat May 22 18:08:39 2021 - [info] 
Sat May 22 18:08:39 2021 - [info] -- Slave recovery on host 172.31.0.38(172.31.0.38:3306) started, pid: 27573. Check tmp log /data/mastermha/app1//172.31.0.38_3306_20210522180836.log if it takes time..
Sat May 22 18:08:40 2021 - [info] 
Sat May 22 18:08:40 2021 - [info] Log messages from 172.31.0.38 ...
Sat May 22 18:08:40 2021 - [info] 
Sat May 22 18:08:39 2021 - [info] Starting recovery on 172.31.0.38(172.31.0.38:3306)..
Sat May 22 18:08:39 2021 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Sat May 22 18:08:39 2021 - [info]   done.
Sat May 22 18:08:39 2021 - [info]  All relay logs were successfully applied.
Sat May 22 18:08:39 2021 - [info]  Resetting slave 172.31.0.38(172.31.0.38:3306) and starting replication from the new master 172.31.0.48(172.31.0.48:3306)..
Sat May 22 18:08:39 2021 - [info]  Executed CHANGE MASTER.
Sat May 22 18:08:39 2021 - [info]  Slave started.
Sat May 22 18:08:40 2021 - [info] End of log messages from 172.31.0.38.
Sat May 22 18:08:40 2021 - [info] -- Slave recovery on host 172.31.0.38(172.31.0.38:3306) succeeded.
Sat May 22 18:08:40 2021 - [info] All new slave servers recovered successfully.
Sat May 22 18:08:40 2021 - [info] 
Sat May 22 18:08:40 2021 - [info] * Phase 5: New master cleanup phase..
Sat May 22 18:08:40 2021 - [info] 
Sat May 22 18:08:40 2021 - [info] Resetting slave info on the new master..
Sat May 22 18:08:40 2021 - [info]  172.31.0.48: Resetting slave info succeeded.
Sat May 22 18:08:40 2021 - [info] Master failover to 172.31.0.48(172.31.0.48:3306) completed successfully.
Sat May 22 18:08:40 2021 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.31.0.28(172.31.0.28:3306) to 172.31.0.48(172.31.0.48:3306) succeeded

Master 172.31.0.28(172.31.0.28:3306) is down!

Check MHA Manager logs at localhost.localdomain:/data/mastermha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.31.0.28(172.31.0.28:3306)
The latest slave 172.31.0.48(172.31.0.48:3306) has all relay logs for recovery.
Selected 172.31.0.48(172.31.0.48:3306) as a new master.
172.31.0.48(172.31.0.48:3306): OK: Applying all logs succeeded.
172.31.0.48(172.31.0.48:3306): OK: Activated master IP address.
172.31.0.38(172.31.0.38:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.31.0.38(172.31.0.38:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.31.0.48(172.31.0.48:3306)
172.31.0.48(172.31.0.48:3306): Resetting slave info succeeded.
Master failover to 172.31.0.48(172.31.0.48:3306) completed successfully.
Sat May 22 18:08:40 2021 - [info] Sending mail..
sh: /usr/local/bin/sendmail.sh: No such file or directory
Sat May 22 18:08:40 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 127:0

再次检查状态

[root@localhost ~]# masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

原来的master追踪日志检测也会停止

[root@centos8 ~]# tail -f /var/lib/mysql/centos8.log

验证VIP漂移至新的Master上

[root@centos8 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:16:9a:81 brd ff:ff:ff:ff:ff:ff
    inet 172.31.0.48/16 brd 172.31.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 172.31.0.100/16 brd 172.31.255.255 scope global secondary eth0:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe16:9a81/64 scope link

报错：

# 检查主从复制repl报错
[root@centos8 ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf
Sat May 22 19:11:34 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 22 19:11:34 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Sat May 22 19:11:34 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Sat May 22 19:11:34 2021 - [info] MHA::MasterMonitor version 0.58.
Sat May 22 19:11:36 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm line 201.
Sat May 22 19:11:36 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sat May 22 19:11:36 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

思路：

一般情况下是主从关系没有搭建成功，首先要保证主库数据要和其他从库数据保持一致，主库和主备的配置文件配置要正确，都要开启半同步复制，主库要授权从库同步数据的用户权限，从库进行相应配置

mysql8.0版本尚未得到mha4mysql的支持，改源码（这次没有用到改源码，是因为使用了CentOS8，感觉MHA对于CentOS8很不友好）

[root@centos8 ~]# grep -rn 'sub parse_mysql_major_version($)' /usr/share/perl5/vendor_perl/MHA/
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm:199:sub parse_mysql_major_version($)
# 原代码
#sub parse_mysql_major_version($) {
#  my $str = shift;
#  my $result = sprintf( '%03d%03d', $str =~ m/(d+)/g );
#  return $result;
#}

# 改动后代码
sub parse_mysql_major_version($) {
my $str = shift;
  $str =~ /(d+).(d+)/;
  my $strmajor = "$1.$2";
  my $result = sprintf( '%03d%03d', $strmajor =~ m/(d+)/g );
  return $result;
}

CentOS7 mha4安装失败

--> Finished Dependency Resolution
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
           Requires: perl(Log::Dispatch)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
           Requires: perl(Parallel::ForkManager)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
           Requires: perl(Log::Dispatch::File)
Error: Package: mha4mysql-manager-0.58-0.el7.centos.noarch (/mha4mysql-manager-0.58-0.el7.centos.
           Requires: perl(Log::Dispatch::Screen)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

解决方法：

[root@localhost ~]# yum install epel-release -y
# 重新安装即可
[root@localhost ~]# yum instll mha4mysql-*.rpm -y

相关阅读:
Vue 多环境的配置 look
01 java基本类型和包装类型的区别？ look
03 java自动装箱与拆箱了解吗？原理是什么？ look
Windows下MySQL的安装和删除 look
02 java包装类型的缓存机制 look
test
keepalived 主备搭建及配置
 rename批量重命名文件名
 keepalived执行stop命令无法退出进程问题
 职场PUA
原文地址：https://www.cnblogs.com/xuanlv-0413/p/14799801.html