MHA提供了3种方式用于实现故障转移,分别自动故障转移,需要启用MHA监控;
在无监控的情况下的手动故障转移以及基于在线手动切换。
三种方式可以应对MySQL主从故障的任意场景。本文主要描述在无监控的情形是手动实现故障转移。供大家参考。
有关MHA的其他两种切换方式,可以参考:
MHA 在线切换过程
MHA 自动故障转移步骤及过程剖析
1、手动故障转移的特点
a、在监控节点未启用masterha_manager
b、master库已经宕机或者转移到高性能服务器
c、手动故障转移支持交互或非交互两种模式
d、切换样例:$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host1
2、masterha_master_switch切换的几个参数
--master_state=dead
强制参数为"dead" 或者 "alive". dead为手动故障转移,alive为在线切换。
--dead_master_host=(hostname)
强制参数为主机名,另2个--dead_master_ip --dead_master_port(缺省3306)可选。
--new_master_host=(hostname)
可选参数,用于指定新master,如果未指定则按candidate_master参数设定值。
--interactive=(0|1)
可选参数,指定是否交互。缺省为1,表明交互
1.server1:
service mysql.server stop
2.monitor:
[root@monitor tmp]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.conf --dead_master_host=server1 --dead_master_port=3306 --new_master_host=slave1 --new_master_port=3306 --dead_master_ip=<dead_master_ip> is not set. Using 10.24.220.232. Mon May 16 09:19:38 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon May 16 09:19:38 2016 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Mon May 16 09:19:38 2016 - [info] Reading server configuration from /etc/masterha/app1.conf.. Mon May 16 09:19:38 2016 - [info] MHA::MasterFailover version 0.56. Mon May 16 09:19:38 2016 - [info] Starting master failover. Mon May 16 09:19:38 2016 - [info] Mon May 16 09:19:38 2016 - [info] * Phase 1: Configuration Check Phase.. Mon May 16 09:19:38 2016 - [info] Mon May 16 09:19:38 2016 - [debug] Connecting to servers.. Mon May 16 09:19:39 2016 - [debug] Connected to: slave1(10.24.220.70:3306), user=root Mon May 16 09:19:39 2016 - [debug] Number of slave worker threads on host slave1(10.24.220.70:3306): 0 Mon May 16 09:19:39 2016 - [debug] Connected to: slave2(10.169.214.33:3306), user=root Mon May 16 09:19:39 2016 - [debug] Number of slave worker threads on host slave2(10.169.214.33:3306): 0 Mon May 16 09:19:39 2016 - [debug] Comparing MySQL versions.. Mon May 16 09:19:39 2016 - [debug] Comparing MySQL versions done. Mon May 16 09:19:39 2016 - [debug] Connecting to servers done. Mon May 16 09:19:39 2016 - [info] GTID failover mode = 1 Mon May 16 09:19:39 2016 - [info] Dead Servers: Mon May 16 09:19:39 2016 - [info] server1(10.24.220.232:3306) Mon May 16 09:19:39 2016 - [info] Checking master reachability via MySQL(double check)... Mon May 16 09:19:39 2016 - [info] ok. Mon May 16 09:19:39 2016 - [info] Alive Servers: Mon May 16 09:19:39 2016 - [info] slave1(10.24.220.70:3306) Mon May 16 09:19:39 2016 - [info] slave2(10.169.214.33:3306) Mon May 16 09:19:39 2016 - [info] Alive Slaves: Mon May 16 09:19:39 2016 - [info] slave1(10.24.220.70:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:39 2016 - [info] GTID ON Mon May 16 09:19:39 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:39 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:39 2016 - [info] Primary candidate for the new Master (candidate_master is set) Mon May 16 09:19:39 2016 - [info] slave2(10.169.214.33:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:39 2016 - [info] GTID ON Mon May 16 09:19:39 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:39 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:39 2016 - [info] Not candidate for the new Master (no_master is set) Master server1(10.24.220.232:3306) is dead. Proceed? (yes/NO): yes Mon May 16 09:19:47 2016 - [info] Starting GTID based failover. Mon May 16 09:19:47 2016 - [info] Mon May 16 09:19:47 2016 - [info] ** Phase 1: Configuration Check Phase completed. Mon May 16 09:19:47 2016 - [info] Mon May 16 09:19:47 2016 - [info] * Phase 2: Dead Master Shutdown Phase.. Mon May 16 09:19:47 2016 - [info] Mon May 16 09:19:47 2016 - [debug] SSH connection test to server1, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Mon May 16 09:19:47 2016 - [debug] Stopping IO thread on slave2(10.169.214.33:3306).. Mon May 16 09:19:47 2016 - [debug] Stopping IO thread on slave1(10.24.220.70:3306).. Mon May 16 09:19:47 2016 - [debug] Stop IO thread on slave2(10.169.214.33:3306) done. Mon May 16 09:19:47 2016 - [debug] Stop IO thread on slave1(10.24.220.70:3306) done. Mon May 16 09:19:48 2016 - [info] HealthCheck: SSH to server1 is reachable. Mon May 16 09:19:49 2016 - [info] Forcing shutdown so that applications never connect to the current master.. Mon May 16 09:19:49 2016 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address. Mon May 16 09:19:49 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon May 16 09:19:49 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon May 16 09:19:49 2016 - [info] Mon May 16 09:19:49 2016 - [info] * Phase 3: Master Recovery Phase.. Mon May 16 09:19:49 2016 - [info] Mon May 16 09:19:49 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon May 16 09:19:49 2016 - [info] Mon May 16 09:19:49 2016 - [debug] Fetching current slave status.. Mon May 16 09:19:49 2016 - [debug] Fetching current slave status done. Mon May 16 09:19:49 2016 - [info] The latest binary log file/position on all slaves is log.000005:528 Mon May 16 09:19:49 2016 - [info] Retrieved Gtid Set: 191f7a9f-ffa2-11e5-a825-00163e00242a:1-4 Mon May 16 09:19:49 2016 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon May 16 09:19:49 2016 - [info] slave1(10.24.220.70:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:49 2016 - [info] GTID ON Mon May 16 09:19:49 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:49 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:49 2016 - [info] Primary candidate for the new Master (candidate_master is set) Mon May 16 09:19:49 2016 - [info] slave2(10.169.214.33:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:49 2016 - [info] GTID ON Mon May 16 09:19:49 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:49 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:49 2016 - [info] Not candidate for the new Master (no_master is set) Mon May 16 09:19:49 2016 - [info] The oldest binary log file/position on all slaves is log.000005:528 Mon May 16 09:19:49 2016 - [info] Retrieved Gtid Set: 191f7a9f-ffa2-11e5-a825-00163e00242a:1-4 Mon May 16 09:19:49 2016 - [info] Oldest slaves: Mon May 16 09:19:49 2016 - [info] slave1(10.24.220.70:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:49 2016 - [info] GTID ON Mon May 16 09:19:49 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:49 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:49 2016 - [info] Primary candidate for the new Master (candidate_master is set) Mon May 16 09:19:49 2016 - [info] slave2(10.169.214.33:3306) Version=5.7.11-log (oldest major version between slaves) log-bin:enabled Mon May 16 09:19:49 2016 - [info] GTID ON Mon May 16 09:19:49 2016 - [debug] Relay log info repository: FILE Mon May 16 09:19:49 2016 - [info] Replicating from 10.24.220.232(10.24.220.232:3306) Mon May 16 09:19:49 2016 - [info] Not candidate for the new Master (no_master is set) Mon May 16 09:19:49 2016 - [info] Mon May 16 09:19:49 2016 - [info] * Phase 3.3: Determining New Master Phase.. Mon May 16 09:19:49 2016 - [info] Mon May 16 09:19:49 2016 - [info] slave1 can be new master. Mon May 16 09:19:49 2016 - [info] New master is slave1(10.24.220.70:3306) Mon May 16 09:19:49 2016 - [info] Starting master failover.. Mon May 16 09:19:49 2016 - [info] From: server1(10.24.220.232:3306) (current master) +--slave1(10.24.220.70:3306) +--slave2(10.169.214.33:3306) To: slave1(10.24.220.70:3306) (new master) +--slave2(10.169.214.33:3306) Starting master switch from server1(10.24.220.232:3306) to slave1(10.24.220.70:3306)? (yes/NO): yes Mon May 16 09:20:43 2016 - [info] New master decided manually is slave1(10.24.220.70:3306) Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] * Phase 3.3: New Master Recovery Phase.. Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] Waiting all logs to be applied.. Mon May 16 09:20:43 2016 - [info] done. Mon May 16 09:20:43 2016 - [debug] Stopping slave IO/SQL thread on slave1(10.24.220.70:3306).. Mon May 16 09:20:43 2016 - [debug] done. Mon May 16 09:20:43 2016 - [info] Getting new master's binlog name and position.. Mon May 16 09:20:43 2016 - [info] log.000001:818 Mon May 16 09:20:43 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 10.24.220.70', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon May 16 09:20:43 2016 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: log.000001, 818, 191f7a9f-ffa2-11e5-a825-00163e00242a:1-4 Mon May 16 09:20:43 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address. Mon May 16 09:20:43 2016 - [info] ** Finished master recovery successfully. Mon May 16 09:20:43 2016 - [info] * Phase 3: Master Recovery Phase completed. Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] * Phase 4: Slaves Recovery Phase.. Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] * Phase 4.1: Starting Slaves in parallel.. Mon May 16 09:20:43 2016 - [info] Mon May 16 09:20:43 2016 - [info] -- Slave recovery on host slave2(10.169.214.33:3306) started, pid: 7774. Check tmp log /var/log/masterha/app1/slave2_3306_20160516091938.log if it takes time.. Mon May 16 09:20:44 2016 - [info] Mon May 16 09:20:44 2016 - [info] Log messages from slave2 ... Mon May 16 09:20:44 2016 - [info] Mon May 16 09:20:43 2016 - [info] Resetting slave slave2(10.169.214.33:3306) and starting replication from the new master slave1(10.24.220.70:3306).. Mon May 16 09:20:43 2016 - [debug] Stopping slave IO/SQL thread on slave2(10.169.214.33:3306).. Mon May 16 09:20:43 2016 - [debug] done. Mon May 16 09:20:43 2016 - [info] Executed CHANGE MASTER. Mon May 16 09:20:43 2016 - [debug] Starting slave IO/SQL thread on slave2(10.169.214.33:3306).. Mon May 16 09:20:44 2016 - [debug] done. Mon May 16 09:20:44 2016 - [info] Slave started. Mon May 16 09:20:44 2016 - [info] gtid_wait(191f7a9f-ffa2-11e5-a825-00163e00242a:1-4) completed on slave2(10.169.214.33:3306). Executed 0 events. Mon May 16 09:20:44 2016 - [info] End of log messages from slave2. Mon May 16 09:20:44 2016 - [info] -- Slave on host slave2(10.169.214.33:3306) started. Mon May 16 09:20:44 2016 - [info] All new slave servers recovered successfully. Mon May 16 09:20:44 2016 - [info] Mon May 16 09:20:44 2016 - [info] * Phase 5: New master cleanup phase.. Mon May 16 09:20:44 2016 - [info] Mon May 16 09:20:44 2016 - [info] Resetting slave info on the new master.. Mon May 16 09:20:44 2016 - [debug] Clearing slave info.. Mon May 16 09:20:44 2016 - [debug] Stopping slave IO/SQL thread on slave1(10.24.220.70:3306).. Mon May 16 09:20:44 2016 - [debug] done. Mon May 16 09:20:44 2016 - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK. Mon May 16 09:20:44 2016 - [info] slave1: Resetting slave info succeeded. Mon May 16 09:20:44 2016 - [info] Master failover to slave1(10.24.220.70:3306) completed successfully. Mon May 16 09:20:44 2016 - [debug] Disconnected from slave1(10.24.220.70:3306) Mon May 16 09:20:44 2016 - [debug] Disconnected from slave2(10.169.214.33:3306) Mon May 16 09:20:44 2016 - [info] ----- Failover Report ----- app1: MySQL Master failover server1(10.24.220.232:3306) to slave1(10.24.220.70:3306) succeeded Master server1(10.24.220.232:3306) is down! Check MHA Manager logs at monitor for details. Started manual(interactive) failover. Selected slave1(10.24.220.70:3306) as a new master. slave1(10.24.220.70:3306): OK: Applying all logs succeeded. slave2(10.169.214.33:3306): OK: Slave started, replicating from slave1(10.24.220.70:3306) slave1(10.24.220.70:3306): Resetting slave info succeeded. Master failover to slave1(10.24.220.70:3306) completed successfully.