• MHA ssh检查,repl复制检查和在线切换日志分析


    一、SSh 检查日志分析

    执行过程及对应的日志:

    1、读取MHA manger 节点上的配置文件

    2、根据配置文件,得到各个主机的信息,逐一进行SSH检查

    3、每个主机都通过SSH连接除了自己以外的其他所有主机

    4、当所有主机相互之间都能通过SSH免密登录,SSH检查就通过。

    [root@A2 app1]# masterha_check_ssh --conf=/etc/masterha/app1.conf 
    Sun Jun 23 18:36:45 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sun Jun 23 18:36:45 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
    Sun Jun 23 18:36:45 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
    Sun Jun 23 18:36:45 2019 - [info] Starting SSH connection tests..
    Sun Jun 23 18:36:45 2019 - [debug] 
    Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.13.15(172.16.13.15:22) to root@172.16.15.3(172.16.15.3:22)..
    Sun Jun 23 18:36:45 2019 - [debug]   ok.
    Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.13.15(172.16.13.15:22) to root@172.16.15.2(172.16.15.2:22)..
    Sun Jun 23 18:36:45 2019 - [debug]   ok.
    Sun Jun 23 18:36:46 2019 - [debug] 
    Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.15.3(172.16.15.3:22) to root@172.16.13.15(172.16.13.15:22)..
    Sun Jun 23 18:36:45 2019 - [debug]   ok.
    Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.15.3(172.16.15.3:22) to root@172.16.15.2(172.16.15.2:22)..
    Sun Jun 23 18:36:46 2019 - [debug]   ok.
    Sun Jun 23 18:36:46 2019 - [debug] 
    Sun Jun 23 18:36:46 2019 - [debug]  Connecting via SSH from root@172.16.15.2(172.16.15.2:22) to root@172.16.13.15(172.16.13.15:22)..
    Sun Jun 23 18:36:46 2019 - [debug]   ok.
    Sun Jun 23 18:36:46 2019 - [debug]  Connecting via SSH from root@172.16.15.2(172.16.15.2:22) to root@172.16.15.3(172.16.15.3:22)..
    Sun Jun 23 18:36:46 2019 - [debug]   ok.
    Sun Jun 23 18:36:46 2019 - [info] All SSH connection tests passed successfully.

    二、主从复制检查日志分析

    1、读取配置文件,根据配置文件,检查当前的所有主机状态,MHA Node版本,是否支持GTID主从复制,得到当前的主从复制架构

    [root@A2 app1]# masterha_check_repl --conf=/etc/masterha/app1.conf 
    Sun Jun 23 18:36:28 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sun Jun 23 18:36:28 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
    Sun Jun 23 18:36:28 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
    Sun Jun 23 18:36:28 2019 - [info] MHA::MasterMonitor version 0.56.
    Sun Jun 23 18:36:29 2019 - [info] GTID failover mode = 0
    Sun Jun 23 18:36:29 2019 - [info] Dead Servers:
    Sun Jun 23 18:36:29 2019 - [info] Alive Servers:
    Sun Jun 23 18:36:29 2019 - [info]   172.16.13.15(172.16.13.15:3306)
    Sun Jun 23 18:36:29 2019 - [info]   172.16.15.3(172.16.15.3:3306)
    Sun Jun 23 18:36:29 2019 - [info]   172.16.15.2(172.16.15.2:3306)
    Sun Jun 23 18:36:29 2019 - [info] Alive Slaves:
    Sun Jun 23 18:36:29 2019 - [info]   172.16.13.15(172.16.13.15:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
    Sun Jun 23 18:36:29 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
    Sun Jun 23 18:36:29 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sun Jun 23 18:36:29 2019 - [info]   172.16.15.2(172.16.15.2:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
    Sun Jun 23 18:36:29 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
    Sun Jun 23 18:36:29 2019 - [info] Current Alive Master: 172.16.15.3(172.16.15.3:3306)
    2、检查从库配置,检查主从复制是否过滤,是否支持GTID复制
    Sun Jun 23 18:36:29 2019 - [info] Checking slave configurations..
    Sun Jun 23 18:36:29 2019 - [info] Checking replication filtering settings..
    Sun Jun 23 18:36:29 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
    Sun Jun 23 18:36:29 2019 - [info]  Replication filtering check ok.
    Sun Jun 23 18:36:29 2019 - [info] GTID (with auto-pos) is not supported
    3、进行SSH连接测试,MHA版本检查

    Sun Jun 23 18:36:29 2019 - [info] Starting SSH connection tests..
    Sun Jun 23 18:36:31 2019 - [info] All SSH connection tests passed successfully.
    Sun Jun 23 18:36:31 2019 - [info] Checking MHA Node version..
    Sun Jun 23 18:36:31 2019 - [info]  Version check ok.
    4、检查主库上SSH 配置,测试恢复脚本(save_binary_logs)的可用性,对binlog设置进行检查
    Sun Jun 23 18:36:31 2019 - [info] Checking SSH publickey authentication settings on the current master..
    Sun Jun 23 18:36:31 2019 - [info] HealthCheck: SSH to 172.16.15.3 is reachable.
    Sun Jun 23 18:36:32 2019 - [info] Master MHA Node version is 0.56.
    Sun Jun 23 18:36:32 2019 - [info] Checking recovery script configurations on 172.16.15.3(172.16.15.3:3306)..
    Sun Jun 23 18:36:32 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql_bin.000025 
    Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.15.3(172.16.15.3:22).. 
      Creating /tmp if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /usr/local/mysql/data, up to mysql_bin.000025
    Sun Jun 23 18:36:32 2019 - [info] Binlog setting check done.
    5、检查从库SSH配置,测试应用差异日志脚本(apply_diff_relay_logs)的可用性,检查从库恢复环境和 relay log 的情况,检查MySQL的连接和权限,清理测试文件
    Sun Jun 23 18:36:32 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Sun Jun 23 18:36:32 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.13.15 --slave_ip=172.16.13.15 --slave_port=3306 --workdir=/tmp --target_version=5.7.16-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
    Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.13.15(172.16.13.15:22).. 
      Checking slave recovery environment settings..
        Opening /usr/local/mysql/data/relay-log.info ... ok.
        Relay log found at /usr/local/mysql/data, up to mysqlserver-relay-bin.000002
        Temporary relay log file is /usr/local/mysql/data/mysqlserver-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
        
    Sun Jun 23 18:36:32 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.15.2 --slave_ip=172.16.15.2 --slave_port=3306 --workdir=/tmp --target_version=5.7.16-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
    Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.15.2(172.16.15.2:22).. 
      Checking slave recovery environment settings..
        Opening /usr/local/mysql/data/relay-log.info ... ok.
        Relay log found at /usr/local/mysql/data, up to A2-relay-bin.000002
        Temporary relay log file is /usr/local/mysql/data/A2-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Sun Jun 23 18:36:32 2019 - [info] Slaves settings check done.
    6、得到当前的主从结构,检查每个从库的复制状态。检查故障切换等脚本的状态,完成主从复制检查
    Sun Jun 23 18:36:32 2019 - [info] 
    172.16.15.3(172.16.15.3:3306) (current master)
     +--172.16.13.15(172.16.13.15:3306)
     +--172.16.15.2(172.16.15.2:3306)
    
    Sun Jun 23 18:36:32 2019 - [info] Checking replication health on 172.16.13.15..
    Sun Jun 23 18:36:32 2019 - [info]  ok.
    Sun Jun 23 18:36:32 2019 - [info] Checking replication health on 172.16.15.2..
    Sun Jun 23 18:36:32 2019 - [info]  ok.
    Sun Jun 23 18:36:32 2019 - [info] Checking master_ip_failover_script status:
    Sun Jun 23 18:36:32 2019 - [info]   /var/log/masterha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 
    Checking the Status of the script.. OK 
    Sun Jun 23 18:36:32 2019 - [info]  OK.
    Sun Jun 23 18:36:32 2019 - [warning] shutdown_script is not defined.
    Sun Jun 23 18:36:32 2019 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

    三、在线日志切换分析

    1、读取配置文件,检查是否支持GTID复制,得到当前的主从结构
    [root@A2 app1]# masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=172.16.13.15 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000
    Sun Jun 23 19:29:06 2019 - [info] MHA::MasterRotate version 0.56.
    Sun Jun 23 19:29:06 2019 - [info] Starting online master switch..
    Sun Jun 23 19:29:06 2019 - [info] 
    Sun Jun 23 19:29:06 2019 - [info] * Phase 1: Configuration Check Phase..
    Sun Jun 23 19:29:06 2019 - [info] 
    Sun Jun 23 19:29:06 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Sun Jun 23 19:29:06 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
    Sun Jun 23 19:29:06 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
    Sun Jun 23 19:29:08 2019 - [info] GTID failover mode = 0
    Sun Jun 23 19:29:08 2019 - [info] Current Alive Master: 172.16.15.3(172.16.15.3:3306)
    Sun Jun 23 19:29:08 2019 - [info] Alive Slaves:
    Sun Jun 23 19:29:08 2019 - [info]   172.16.13.15(172.16.13.15:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
    Sun Jun 23 19:29:08 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
    Sun Jun 23 19:29:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
    Sun Jun 23 19:29:08 2019 - [info]   172.16.15.2(172.16.15.2:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
    Sun Jun 23 19:29:08 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
    2、在主库上确认执行 FLUSH NO_WRITE_TO_BINLOG TABLES,关闭已经打开的表,不再记录binlog,进行主从复制检查,得到新主库的信息
    It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.16.15.3(172.16.15.3:3306)? (YES/no): yes
    Sun Jun 23 19:29:11 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
    Sun Jun 23 19:29:11 2019 - [info]  ok.
    Sun Jun 23 19:29:11 2019 - [info] Checking MHA is not monitoring or doing failover..
    Sun Jun 23 19:29:11 2019 - [info] Checking replication health on 172.16.13.15..
    Sun Jun 23 19:29:11 2019 - [info]  ok.
    Sun Jun 23 19:29:11 2019 - [info] Checking replication health on 172.16.15.2..
    Sun Jun 23 19:29:11 2019 - [info]  ok.
    Sun Jun 23 19:29:11 2019 - [info] 172.16.13.15 can be new master.
    Sun Jun 23 19:29:11 2019 - [info] 
    From:
    172.16.15.3(172.16.15.3:3306) (current master)
     +--172.16.13.15(172.16.13.15:3306)
     +--172.16.15.2(172.16.15.2:3306)
    
    To:
    172.16.13.15(172.16.13.15:3306) (new master)
     +--172.16.15.2(172.16.15.2:3306)
     +--172.16.15.3(172.16.15.3:3306)
    3、开始从旧主切换到新主,检查新主能否成为主库。检查复制过滤,临时将旧主change master to到一个dummy地址
    Starting master switch from 172.16.15.3(172.16.15.3:3306) to 172.16.13.15(172.16.13.15:3306)? (yes/NO): yes
    Sun Jun 23 19:29:13 2019 - [info] Checking whether 172.16.13.15(172.16.13.15:3306) is ok for the new master..
    Sun Jun 23 19:29:13 2019 - [info]  ok.
    Sun Jun 23 19:29:13 2019 - [info] 172.16.15.3(172.16.15.3:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
    Sun Jun 23 19:29:13 2019 - [info] 172.16.15.3(172.16.15.3:3306): Resetting slave pointing to the dummy host.
    Sun Jun 23 19:29:14 2019 - [info] ** Phase 1: Configuration Check Phase completed.
    Sun Jun 23 19:29:14 2019 - [info] 
    4、旧主库上执行master_ip_online_change,停止虚拟IP。在旧主上执行 FLUSH TABLES WITH READ LOCK..,实现全局读锁。
    Sun Jun 23 19:29:14 2019 - [info] * Phase 2: Rejecting updates Phase..
    Sun Jun 23 19:29:14 2019 - [info] 
    Sun Jun 23 19:29:14 2019 - [info] Executing master ip online change script to disable write on the current master:
    Sun Jun 23 19:29:14 2019 - [info]   /var/log/masterha/scripts/master_ip_online_change --command=stop --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='115433' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port=3306 --new_master_user='root' --new_master_password='115433' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
    
    ***************************************************************
    
    Disabling the VIP - 172.16.13.141/16 on old master: 172.16.15.3
    
    Disabled the VIP successfully
    ***************************************************************
    
    
    Sun Jun 23 19:29:14 2019 - [info]  ok.
    Sun Jun 23 19:29:14 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
    Sun Jun 23 19:29:14 2019 - [info] Executing FLUSH TABLES WITH READ LOCK..
    Sun Jun 23 19:29:15 2019 - [info]  ok.
    5、获取旧主的binlog位置,在新主上面应用中继日志。得到新主的binlog 位置,用于后期在其他从库上执行change master  to,在新主库上面开启虚拟IP,set read_only =0
    Sun Jun 23 19:29:15 2019 - [info] Orig master binlog:pos is mysql_bin.000025:150734601.
    Sun Jun 23 19:29:15 2019 - [info]  Waiting to execute all relay logs on 172.16.13.15(172.16.13.15:3306)..
    Sun Jun 23 19:30:48 2019 - [info]  master_pos_wait(mysql_bin.000025:150734601) completed on 172.16.13.15(172.16.13.15:3306). Executed 137 events.
    Sun Jun 23 19:30:48 2019 - [info]   done.
    Sun Jun 23 19:30:48 2019 - [info] Getting new master's binlog name and position..
    Sun Jun 23 19:30:48 2019 - [info]  mysql_bin.000058:91914599
    Sun Jun 23 19:30:48 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.13.15', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000058', MASTER_LOG_POS=91914599, MASTER_USER='root', MASTER_PASSWORD='xxx';
    Sun Jun 23 19:30:48 2019 - [info] Executing master ip online change script to allow write on the new master:
    Sun Jun 23 19:30:48 2019 - [info]   /var/log/masterha/scripts/master_ip_online_change --command=start --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='115433' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port=3306 --new_master_user='root' --new_master_password='115433' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
    
    
    ***************************************************************
    
    Enabling the VIP - 172.16.13.141/16 on new master: 172.16.13.15 
    
    Enabled the VIP successfully
    ***************************************************************
    
    Sun Jun 23 19:30:48 2019 - [info]  ok.
    Sun Jun 23 19:30:48 2019 - [info] Setting read_only=0 on 172.16.13.15(172.16.13.15:3306)..
    Sun Jun 23 19:30:48 2019 - [info]  ok.
    6、并行切换从库,应用中继日志到 旧主binlog位置,执行change master  to;旧主也同时执行,而且执行 UNLOCK  TABLES ,解锁。至此,从库切换完成
    Sun Jun 23 19:30:48 2019 - [info] 
    Sun Jun 23 19:30:48 2019 - [info] * Switching slaves in parallel..
    Sun Jun 23 19:30:48 2019 - [info] 
    Sun Jun 23 19:30:48 2019 - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:3306) started, pid: 6048
    Sun Jun 23 19:30:48 2019 - [info] 
    Sun Jun 23 19:30:49 2019 - [info] Log messages from 172.16.15.2 ...
    Sun Jun 23 19:30:49 2019 - [info] 
    Sun Jun 23 19:30:48 2019 - [info]  Waiting to execute all relay logs on 172.16.15.2(172.16.15.2:3306)..
    Sun Jun 23 19:30:48 2019 - [info]  master_pos_wait(mysql_bin.000025:150734601) completed on 172.16.15.2(172.16.15.2:3306). Executed 0 events.
    Sun Jun 23 19:30:48 2019 - [info]   done.
    Sun Jun 23 19:30:48 2019 - [info]  Resetting slave 172.16.15.2(172.16.15.2:3306) and starting replication from the new master 172.16.13.15(172.16.13.15:3306)..
    Sun Jun 23 19:30:48 2019 - [info]  Executed CHANGE MASTER.
    Sun Jun 23 19:30:48 2019 - [info]  Slave started.
    Sun Jun 23 19:30:49 2019 - [info] End of log messages from 172.16.15.2 ...
    Sun Jun 23 19:30:49 2019 - [info] 
    Sun Jun 23 19:30:49 2019 - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:3306) succeeded.
    Sun Jun 23 19:30:49 2019 - [info] Unlocking all tables on the orig master:
    Sun Jun 23 19:30:49 2019 - [info] Executing UNLOCK TABLES..
    Sun Jun 23 19:30:49 2019 - [info]  ok.
    Sun Jun 23 19:30:49 2019 - [info] Starting orig master as a new slave..
    Sun Jun 23 19:30:49 2019 - [info]  Resetting slave 172.16.15.3(172.16.15.3:3306) and starting replication from the new master 172.16.13.15(172.16.13.15:3306)..
    Sun Jun 23 19:30:49 2019 - [info]  Executed CHANGE MASTER.
    Sun Jun 23 19:30:49 2019 - [info]  Slave started.
    Sun Jun 23 19:30:49 2019 - [info] All new slave servers switched successfully.
    Sun Jun 23 19:30:49 2019 - [info] 

    7、对新主清理,更改从库信息
    Sun Jun 23 19:30:49 2019 - [info] * Phase 5: New master cleanup phase..
    Sun Jun 23 19:30:49 2019 - [info] 
    Sun Jun 23 19:30:49 2019 - [info]  172.16.13.15: Resetting slave info succeeded.
    Sun Jun 23 19:30:49 2019 - [info] Switching master to 172.16.13.15(172.16.13.15:3306) completed successfully.
  • 相关阅读:
    hbase安装(zookeeper等)
    虚函数表以及RTTI阅读(阅后焚)
    void及void指针含义的深刻解析
    泛型和模版的区别
    C++析构函数为什么要为虚函数
    C++对象模型
    在C++中子类继承和调用父类的构造函数方法
    什么是复合类型
    C++ read
    gdb 调试c/c++的一些小技巧
  • 原文地址:https://www.cnblogs.com/cmgg/p/11089045.html
Copyright © 2020-2023  润新知