• 基于 MHA 的 MySQL 高可用方案


    一、工作流程
    (1)从宕机崩溃的 master 上尝试保存二进制日志事件(binlog events);
    (2)识别含有最新更新的 slave 服务器;
    (3)应用差异的中继日志(relay log)到其他的 slave;
    (4)应用从 master 保存的二进制日志事件(binlog events);
    (5)提升一个 slave 为新的 master 服务器;
    (6)将其他的 slave 连接指向新的 master 进行主从复制;

    二、实验环境

    1、注意安装环境,安装工具软件包net-tools

    2、

    主机名 IP地址 角色 serverID 数据库类型
    server01 192.168.200.111 primary master1 1 写入
    server02 192.168.200.112 secondary master2 2 写入
    server03 192.168.200.113 slave1 3 读取
    server04 192.168.200.114 slave2 4 读取
    server05 192.168.200.115 manager   监控复制组

        其中primary master对外提供写服务,备选secondary master实际相当于slave,提供读取服务,salve1和slave2也提供相关读服务,一旦primary master宕机,将会把备选secondary master提升为新的primary master,slave1和slave2指向新的master。

    三、前期环境部署

    1、配置所有主机名称

    master1 主机:
    hostname server01
    bash
    
    master2 主机:
    hostname server02
    bash
    
    slave1 主机:
    hostname server03
    bash
    
    slave2 主机:
    hostname server04
    bash
    
    manager 主机:
    hostname server05
    bash

    2、配置所有主机的主机名与IP地址的映射关系

    vim /etc/hosts
    
    192.168.200.111 server01
    192.168.200.112 server02
    192.168.200.113 server03
    192.168.200.114 server04
    192.168.200.115 server05

    3、所有主机关闭防火墙与linux安全机制

    systemctl stop firewalld 
    iptables -F
    setenforce 0

    4、安装在线yum源与epel源(epel-release)

    [root@server01 yum.repos.d]# ls
    a  CentOS7-Base-163_(1).repo
    [root@server01 yum.repos.d]# yum -y install epel-release
    
    [root@server01 yum.repos.d]# ls
    a  CentOS7-Base-163_(1).repo  epel.repo  epel-testing.repo
    修改配置文件
    [root@server01 yum.repos.d]# vim epel.repo

    rpm -ivh epel-release-latest-7.noarch.rpm

    四、所有主机安装MHA node

    1、安装所支持的软件及相关的perl依赖包

    yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
    
    ------------------------------------------------------------------------------
    查看是否安装成功
    rpm -q perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder
    安装成功反馈结果
    perl-DBD-MySQL-4.023-6.el7.x86_64
    perl-DBI-1.627-4.el7.x86_64
    perl-CPAN-1.9800-294.el7_6.noarch
    perl-ExtUtils-CBuilder-0.28.2.6-294.el7_6.noarch

    2、上传、解压、安装node

    tar xf mha4mysql-node-0.56.tar.gz
    cd mha4mysql-node-0.56/
    perl Makefile.PL
    make && make install

    3、安装成功后,在/usr/local/bin下生成四个脚本

    ls -l /usr/local/bin/
    总用量 40
    -r-xr-xr-x. 1 root root 16346 10月 22 14:42 apply_diff_relay_logs
    -r-xr-xr-x. 1 root root  4807 10月 22 14:42 filter_mysqlbinlog
    -r-xr-xr-x. 1 root root  7401 10月 22 14:42 purge_relay_logs
    -r-xr-xr-x. 1 root root  7395 10月 22 14:42 save_binary_logs

    五、在manager(192.168.200.115)主机上安装MHA Manager

    1、安装依赖软件

    yum install -y perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes
    
    --------------------------------------------------------
    yum -y install perl-Config-Tiny-2.14-7.el7.noarch.rpm
    
    --------------------------------------------------------
    检查是否安装成功
    rpm -q perl  perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI
    
    perl
    -5.16.3-294.el7_6.x86_64 perl-Log-Dispatch-2.41-1.el7.1.noarch perl-Parallel-ForkManager-1.18-2.el7.noarch perl-DBD-MySQL-4.023-6.el7.x86_64 perl-DBI-1.627-4.el7.x86_64

    2、上传、解压、安装MHA Manager软件包

    tar xf mha4mysql-manager-0.56.tar.gz
    cd mha4mysql-manager-0.56/
    perl Makefile.PL
    make && make install

    六、配置ssh双向密钥对

    服务器先生成一个密钥对,把自己的公钥传给对方

    1、server05(192.168.200.115)

    ssh-keygen -t rsa
    ssh-copy-id server01
    ssh-copy-id server02
    ssh-copy-id server03
    ssh-copy-id server04
    
    -----------------------------------------
    或
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
    
    
    注意:Server05需要连接每个主机测试,因为第一次连接的时候需要输入yes,影响后期故障切换时,对于每个主机的SSH控制。
    ssh server01
    ssh server02
    ssh server03
    ssh server04

    -------------------------------------

    ssh root@192.168.200.111
    ssh root@192.168.200.112
    ssh root@192.168.200.113
    ssh root@192.168.200.114

    2、server04(192.168.200.114)

    ssh-keygen -t rsa
    ssh-copy-id server01
    ssh-copy-id server02
    ssh-copy-id server03
    
    ------------------------------------------------------
    或
    ssh-keygen -t rsa
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113

    3、server03(192.168.200.113)

    ssh-keygen -t rsa
    ssh-copy-id server01
    ssh-copy-id server02
    ssh-copy-id server04
    
    ------------------------------------------------------------
    或
    ssh-keygen -t rsa
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

    4、server02(192.168.200.112)

    ssh-keygen -t rsa
    ssh-copy-id server01
    ssh-copy-id server03
    ssh-copy-id server04
    
    -----------------------------------------------------------------------
    ssh-keygen -t rsa
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

    5、server01(192.168.200.111)

    ssh-keygen -t rsa
    ssh-copy-id server02
    ssh-copy-id server03
    ssh-copy-id server04
    
    -----------------------------------------------------------------
    或
    ssh-keygen -t rsa
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

    七、安装mysql

    1、server01-server04主机上

    yum -y install mariadb mariadb-server mariadb-devel
    //启动服务 systemctl start mariadb
    //查看端口 netstat
    -lnpt | grep :3306

    2、给数据库设置初始密码(后面会用到)

    mysqladmin -u root password 123456

    八、搭建主从复制环境

    1、修改mysql配置文件

    server01

    vim /etc/my.cnf
    [mysqld] server
    -id = 1 log-bin=master-bin log-slave-updates=true relay_log_purge=0

    systemctl restart mariadb

    server02

    vim /etc/my.cnf
    [mysqld]
    server-id=2
    log-bin=master-bin
    log-slave-updates=true
    relay_log_purge=0

    systemctl restart mariadb

    server03

    vim /etc/my.cnf
    [mysqld]
    server-id=3
    log-bin=mysql-bin
    relay-log=slave-relay-bin
    log-slave-updates=true
    relay_log_purge=0
    
    systemctl restart mariadb

    server04

    vim /etc/my.cnf
    [mysqld]
    server-id=4
    log-bin=mysql-bin
    relay-log=slave-relay-bin
    log-slave-updates=true
    relay_log_purge=0
    
    systemctl restart mariadb

    2、所有mysql服务器创建复制授权用户

    grant replication slave on *.* to 'repl'@'192.168.200.%' identified by '123456';
    
    flush privileges;

    3、查看主库备份时的binlog名称和文件(192.168.200.111  server01)

    show master status;
    
    +-------------------+----------+--------------+------------------+
    | File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
    +-------------------+----------+--------------+------------------+
    | master-bin.000001 |      474 |              |                  |
    +-------------------+----------+--------------+------------------+
    1 row in set (0.00 sec)

    4、从数据库启动slave(192.168.200.112-114   server02-server04)

    stop slave;
    
    CHANGE MASTER TO
    MASTER_HOST='192.168.200.111',
    MASTER_USER='repl',
    MASTER_PASSWORD='123456',
    MASTER_LOG_FILE='master-bin.000001',
    MASTER_LOG_POS=474;
    
    start slave;
    
    show slave statusG
                Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    5、三台 slave 服务器设置 read_only 状态(server02-server04)

    从库对外只提供读服务,只所以没有写进 mysql 配置文件,是因为随时 server02 会提升为 master。

    [root@server02 ~]# mysql -uroot -p123456 -e 'set global read_only=1'
    [root@server03 ~]# mysql -uroot -p123456 -e 'set global read_only=1'
    [root@server04 ~]# mysql -uroot -p123456 -e 'set global read_only=1'

    6、创建监控用户(server01-server04)

    grant all privileges on *.* to 'root'@'192.168.200.%' identified by '123456';
    flush privileges;

    7、为自己的主机名授权

    server01

    grant all privileges on *.* to 'root'@'server01' identified by '123456';
    flush privileges;

    server02

    grant all privileges on *.* to 'root'@'server02' identified by '123456';
    flush privileges;

    server03

    grant all privileges on *.* to 'root'@'server03' identified by '123456';
    flush privileges;

    server04

    grant all privileges on *.* to 'root'@'server04' identified by '123456';
    flush privileges;

    九、配置MHA环境

    server05(192.168.200.115)在软件包解压后的目录里面有样配置文件

    1、创建MHA的工作目录及相关配置文件

    mkdir /etc/masterha
    cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha

    2、修改app1.cnf配置文件

    vim /etc/masterha/app1.cnf
    
    [server default]
    manager_workdir=/var/log/masterha/app1   #设置manager工作日志
    manager_log=/var/log/masterha/app1/manager.log   #设置manager的日志,这两条都是默认存在的
    
      
    master_binlog_dir=/var/lib/mysql   #设置master默认保存binlog的位置,以便MHA可以找到master日志
    
    master_ip_failover_script= /usr/local/bin/master_ip_failover  #设置自动failover时候的切换脚本
    
    password=123456   #设置mysql中root用户的密码
    user=root
    
    ping_interval=1    #ping包的时间间隔
    
    remote_workdir=/tmp   #设置远端mysql在发生切换时保存binlog的具体位置
    
    repl_password=123456   #设置复制用户的密码和用户名
    repl_user=repl
    
    
    
    [server1]
    hostname=server01
    port=3306
    
    [server2]
    hostname=server02
    candidate_master=1   #server02有可能会被提升为新的主
    port=3306
    check_repl_delay=0
    
    [server3]
    hostname=server03
    port=3306
    
    [server4]
    hostname=server04
    port=3306

    3、配置故障转移脚本

    vim /usr/local/bin/master_ip_failover
    
    #!/usr/bin/env perl
    use strict;
    use warnings FATAL => 'all';
    use Getopt::Long;
    my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
    );
    my $vip = '192.168.200.100';              # 写入VIP
    my $key = "1";   #非keepalived方式切换脚本使用的
    my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down";  #那么这里写服务的开关命令
    $ssh_user = "root";
    GetOptions(
    'command=s' => $command,
    'ssh_user=s' => $ssh_user,
    'orig_master_host=s' => $orig_master_host,
    'orig_master_ip=s' => $orig_master_ip,
    'orig_master_port=i' => $orig_master_port,
    'new_master_host=s' => $new_master_host,
    'new_master_ip=s' => $new_master_ip,
    'new_master_port=i' => $new_master_port,
    );
    exit &main();
    sub main {
    print "
    
    IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===
    
    ";
    if ( $command eq "stop" || $command eq "stopssh" ) {
    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    #eval {
    # print "Disabling the VIP on old master: $orig_master_host 
    ";
    # &stop_vip();
    # $exit_code = 0;
    #};
    eval {
    print "Disabling the VIP on old master: $orig_master_host 
    ";
    #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;
    #if ( $ping le "90.0%"&& $ping gt "0.0%" ){
    #$exit_code = 0;
    #}
    #else {
    &stop_vip();
    # updating global catalog, etc
    $exit_code = 0;
    #}
    };
    
    if ($@) {
    warn "Got Error: $@
    ";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "start" ) {
    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
    print "Enabling the VIP - $vip on the new master - $new_master_host 
    ";
    &start_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn $@;
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK 
    ";
    `ssh $ssh_user@$orig_master_ip " $ssh_start_vip "`;
    exit 0;
    }
    else {
    &usage();
    exit 1;
    }
    }
    # A simple system call that enable the VIP on the new master
    sub start_vip() {
    `ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
    }
    # A simple system call that disable the VIP on the old_master
    sub stop_vip() {
    `ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
    }
    sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --
    new_master_host=host --new_master_ip=ip --new_master_port=port
    "; }
    
    #给脚本加执行权限
    chmod +x /usr/local/bin/master_ip_failover

    4、设置从库relay log的清除方式(server02-server04)

    mysql -uroot -p123456 -e 'set global relay_log_purge=0;'

    5、检查MHA ssh通信状态

    masterha_check_ssh --conf=/etc/masterha/app1.cnf
    
    Wed Oct 23 07:05:41 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Wed Oct 23 07:05:41 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
    Wed Oct 23 07:05:41 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..
    Wed Oct 23 07:05:41 2019 - [info] Starting SSH connection tests..
    Wed Oct 23 07:05:44 2019 - [debug] 
    Wed Oct 23 07:05:41 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server02(192.168.200.112:22)..
    Wed Oct 23 07:05:42 2019 - [debug]   ok.
    Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server03(192.168.200.113:22)..
    Wed Oct 23 07:05:43 2019 - [debug]   ok.
    Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server04(192.168.200.114:22)..
    Wed Oct 23 07:05:44 2019 - [debug]   ok.
    Wed Oct 23 07:05:45 2019 - [debug] 
    Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server01(192.168.200.111:22)..
    Wed Oct 23 07:05:43 2019 - [debug]   ok.
    Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server02(192.168.200.112:22)..
    Wed Oct 23 07:05:44 2019 - [debug]   ok.
    Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server04(192.168.200.114:22)..
    Wed Oct 23 07:05:45 2019 - [debug]   ok.
    Wed Oct 23 07:05:45 2019 - [debug] 
    Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server01(192.168.200.111:22)..
    Wed Oct 23 07:05:43 2019 - [debug]   ok.
    Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server03(192.168.200.113:22)..
    Wed Oct 23 07:05:44 2019 - [debug]   ok.
    Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server04(192.168.200.114:22)..
    Wed Oct 23 07:05:45 2019 - [debug]   ok.
    Wed Oct 23 07:05:46 2019 - [debug] 
    Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server01(192.168.200.111:22)..
    Wed Oct 23 07:05:44 2019 - [debug]   ok.
    Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server02(192.168.200.112:22)..
    Wed Oct 23 07:05:44 2019 - [debug]   ok.
    Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server03(192.168.200.113:22)..
    Wed Oct 23 07:05:45 2019 - [debug]   ok.
    Wed Oct 23 07:05:46 2019 - [info] All SSH connection tests passed successfully.
    出现这个结果说明执行成功

    6、检查整个集群的状态

    [root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
    
    Wed Oct 23 07:10:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Wed Oct 23 07:10:22 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
    Wed Oct 23 07:10:22 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..
    Wed Oct 23 07:10:22 2019 - [info] MHA::MasterMonitor version 0.56.
    Wed Oct 23 07:10:23 2019 - [info] Dead Servers:
    Wed Oct 23 07:10:23 2019 - [info] Alive Servers:
    Wed Oct 23 07:10:23 2019 - [info]   server01(192.168.200.111:3306)
    Wed Oct 23 07:10:23 2019 - [info]   server02(192.168.200.112:3306)
    Wed Oct 23 07:10:23 2019 - [info]   server03(192.168.200.113:3306)
    Wed Oct 23 07:10:23 2019 - [info]   server04(192.168.200.114:3306)
    Wed Oct 23 07:10:23 2019 - [info] Alive Slaves:
    Wed Oct 23 07:10:23 2019 - [info]   server02(192.168.200.112:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
    Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
    Wed Oct 23 07:10:23 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
    Wed Oct 23 07:10:23 2019 - [info]   server03(192.168.200.113:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
    Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
    Wed Oct 23 07:10:23 2019 - [info]   server04(192.168.200.114:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
    Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
    Wed Oct 23 07:10:23 2019 - [info] Current Alive Master: server01(192.168.200.111:3306)
    Wed Oct 23 07:10:23 2019 - [info] Checking slave configurations..
    Wed Oct 23 07:10:23 2019 - [info] Checking replication filtering settings..
    Wed Oct 23 07:10:23 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
    Wed Oct 23 07:10:23 2019 - [info]  Replication filtering check ok.
    Wed Oct 23 07:10:23 2019 - [info] Starting SSH connection tests..
    Wed Oct 23 07:10:28 2019 - [info] All SSH connection tests passed successfully.
    Wed Oct 23 07:10:28 2019 - [info] Checking MHA Node version..
    Wed Oct 23 07:10:29 2019 - [info]  Version check ok.
    Wed Oct 23 07:10:29 2019 - [info] Checking SSH publickey authentication settings on the current master..
    Wed Oct 23 07:10:29 2019 - [info] HealthCheck: SSH to server01 is reachable.
    Wed Oct 23 07:10:30 2019 - [info] Master MHA Node version is 0.56.
    Wed Oct 23 07:10:30 2019 - [info] Checking recovery script configurations on the current master..
    Wed Oct 23 07:10:30 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000001 
    Wed Oct 23 07:10:30 2019 - [info]   Connecting to root@server01(server01).. 
      Creating /tmp if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /var/lib/mysql, up to master-bin.000001
    Wed Oct 23 07:10:30 2019 - [info] Master setting check done.
    Wed Oct 23 07:10:30 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Wed Oct 23 07:10:30 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server02 --slave_ip=192.168.200.112 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
    Wed Oct 23 07:10:30 2019 - [info]   Connecting to root@192.168.200.112(server02:22).. 
      Checking slave recovery environment settings..
        Opening /var/lib/mysql/relay-log.info ... ok.
        Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
        Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Wed Oct 23 07:10:31 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server03 --slave_ip=192.168.200.113 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
    Wed Oct 23 07:10:31 2019 - [info]   Connecting to root@192.168.200.113(server03:22).. 
      Checking slave recovery environment settings..
        Opening /var/lib/mysql/relay-log.info ... ok.
        Relay log found at /var/lib/mysql, up to slave-relay-bin.000002
        Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Wed Oct 23 07:10:31 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server04 --slave_ip=192.168.200.114 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
    Wed Oct 23 07:10:31 2019 - [info]   Connecting to root@192.168.200.114(server04:22).. 
      Checking slave recovery environment settings..
        Opening /var/lib/mysql/relay-log.info ... ok.
        Relay log found at /var/lib/mysql, up to slave-relay-bin.000002
        Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Wed Oct 23 07:10:31 2019 - [info] Slaves settings check done.
    Wed Oct 23 07:10:31 2019 - [info] 
    server01 (current master)
     +--server02
     +--server03
     +--server04
    
    Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server02..
    Wed Oct 23 07:10:31 2019 - [info]  ok.
    Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server03..
    Wed Oct 23 07:10:31 2019 - [info]  ok.
    Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server04..
    Wed Oct 23 07:10:31 2019 - [info]  ok.
    Wed Oct 23 07:10:31 2019 - [info] Checking master_ip_failover_script status:
    Wed Oct 23 07:10:31 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server01 --orig_master_ip=192.168.200.111 --orig_master_port=3306 
    
    
    IN SCRIPT TEST====/sbin/ifconfig eno16777728:1 down==/sbin/ifconfig eno16777728:1 192.168.200.100===
    
    Checking the Status of the script.. OK 
    bash: /sbin/ifconfig: 没有那个文件或目录
    Wed Oct 23 07:10:32 2019 - [info]  OK.
    Wed Oct 23 07:10:32 2019 - [warning] shutdown_script is not defined.
    Wed Oct 23 07:10:32 2019 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

    十、VIP配置管理

    1、检查/etc/masterha/app1.cnf文件,检查如下行是否正确

    [root@server05 ~]# grep -n 'master_ip_failover_script' /etc/masterha/app1.cnf
    7:master_ip_failover_script= /usr/local/bin/master_ip_failover

    2、查看故障转移脚本,确定网卡配置正确

    [root@server05 ~]# head -13 /usr/local/bin/master_ip_failover
    #!/usr/bin/env perl
    use strict;
    use warnings FATAL => 'all';
    use Getopt::Long;
    my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
    );
    my $vip = '192.168.200.100';              
    my $key = "1";     
    my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down"; 
    $ssh_user = "root";

    3、检查manager状态

    [root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
    app1 is stopped(2:NOT_RUNNING).     //未开启

    4、开启manager监控

    nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &
    
    再次检查
    [root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
    app1 (pid:6588) is running(0:PING_OK), master:server01
    可以看见已经在监控了

    5、查看VIP server01(192.168.200.111)

     在查看VIP时,第一台主机上没有查看到VIP,可能是因为ifconfig命令没有,安装工具软件包net-tools解决问题

    [root@server01 ~]# yum -y install net-tools

    [root@server01 ~]# ip a | grep eno16777728 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728 inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1

    十一、primary master(192.168.200.111)模拟主库故障

    server01(192.168.200.111)

    [root@server01 ~]# systemctl stop mariadb
    [root@server01 ~]# netstat -lnpt | grep :3306
    [root@server01 ~]# ip a | grep eno16777728
    2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728

    server03(192.168.200.113)状态

    MariaDB [(none)]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.200.112
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: master-bin.000001
              Read_Master_Log_Pos: 1372
                   Relay_Log_File: slave-relay-bin.000002
                    Relay_Log_Pos: 530
            Relay_Master_Log_File: master-bin.000001
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    server04(192.168.200.114)状态

    MariaDB [(none)]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.200.112
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: master-bin.000001
              Read_Master_Log_Pos: 1372
                   Relay_Log_File: slave-relay-bin.000002
                    Relay_Log_Pos: 530
            Relay_Master_Log_File: master-bin.000001
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    Server05(192.168.200.115) 查看监控配置文件已经发生了变化(server01的配置已被删除),监控已自动关闭。

    [root@server05 ~]# cat /etc/masterha/app1.cnf 
    [server default]
    manager_log=/var/log/masterha/app1/manager.log
    manager_workdir=/var/log/masterha/app1
    master_binlog_dir=/var/lib/mysql
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=123456
    ping_interval=1
    remote_workdir=/tmp
    repl_password=123456
    repl_user=repl
    user=root
    
    [server2]
    candidate_master=1
    check_repl_delay=0
    hostname=server02
    port=3306
    
    [server3]
    hostname=server03
    port=3306
    
    [server4]
    hostname=server04
    port=3306

    #监控已关闭 [1]+ 完成 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

    Server05(192.168.200.115) 故障切换过程中的日志文件内容如下:

    Selected server02 as a new master.
    server02: OK: Applying all logs succeeded.
    server02: OK: Activated master IP address.
    server04: This host has the latest relay log events.
    server03: This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    server04: OK: Applying all logs succeeded. Slave started, replicating from server02.
    server03: OK: Applying all logs succeeded. Slave started, replicating from server02.
    server02: Resetting slave info succeeded.
    Master failover to server02(192.168.200.112:3306) completed successfully.

    十二、故障主库修复及VIP切回测试

    1、修复server01,并提升server02为新的主库获取VIP

    server01(192.168.200.111)

    [root@server01 ~]# systemctl start mariadb
    [root@server01 ~]# netstat -lnpt | grep :3306
    tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      6131/mysqld
    [root@server01 ~]# mysql -u root -p123456
    stop slave;
    CHANGE MASTER TO MASTER_HOST
    ='192.168.200.112', MASTER_USER='repl', MASTER_PASSWORD='123456';
    start slave;
    show slave statusG
    *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.112 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1372 Relay_Log_File: mariadb-relay-bin.000002 Relay_Log_Pos: 1208 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes

    server05(192.168.200.115)修改监控配置文件添加server01

    [root@server05 ~]# vim /etc/masterha/app1.cnf
    [server01]
    hostname=server01
    port=3306

    server05(192.168.200.115)检查集群状态

    [root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
    -----------------------------------忽略部分信息-----------------------------------
    Thu Aug 31 22:20:30 2017 - [info] Alive Servers:
    Thu Aug 31 22:20:30 2017 - [info]   server01(192.168.200.111:3306)
    Thu Aug 31 22:20:30 2017 - [info]   server02(192.168.200.112:3306)
    Thu Aug 31 22:20:30 2017 - [info]   server03(192.168.200.113:3306)
    Thu Aug 31 22:20:30 2017 - [info]   server04(192.168.200.114:3306)
    -----------------------------------忽略部分信息-----------------------------------
    server02 (current master)
     +--server01
     +--server03
     +--server04
    -----------------------------------忽略部分信息-----------------------------------
    MySQL Replication Health is OK.

    server05(192.168.200.115)开启监控

    [root@server05 ~]# nohup masterha_manager --conf=/etc/masterha/ap.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &
    [1] 17031

    server02(192.168.200.112)获得VIP

    [root@server02 ~]# ip a 
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether 00:0c:29:ec:88:30 brd ff:ff:ff:ff:ff:ff
        inet 192.168.200.112/24 brd 192.168.200.255 scope global eno16777728
           valid_lft forever preferred_lft forever
        inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1
           valid_lft forever preferred_lft forever
        inet6 fe80::20c:29ff:feec:8830/64 scope link 
           valid_lft forever preferred_lft forever

    2、模拟server02故障,VIP切回server01

    server02(192.168.200.112)

    [root@server02 ~]# systemctl stop mariadb
    [root@server02 ~]# netstat -lnpt | grep :3306

    server01(192.168.200.111)

    [root@server01 ~]# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether 00:0c:29:9b:5c:f0 brd ff:ff:ff:ff:ff:ff
        inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728
           valid_lft forever preferred_lft forever
        inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1
           valid_lft forever preferred_lft forever
        inet6 fe80::20c:29ff:fe9b:5cf0/64 scope link 
           valid_lft forever preferred_lft forever

    server03(192.168.200.113)状态

    MariaDB [(none)]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.200.111
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: master-bin.000002
              Read_Master_Log_Pos: 923
                   Relay_Log_File: slave-relay-bin.000002
                    Relay_Log_Pos: 530
            Relay_Master_Log_File: master-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    server04(192.168.200.114)状态:

    MariaDB [(none)]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.200.111
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: master-bin.000002
              Read_Master_Log_Pos: 923
                   Relay_Log_File: slave-relay-bin.000002
                    Relay_Log_Pos: 530
            Relay_Master_Log_File: master-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    Server05(192.168.200.115) 配置文件变化(已经移除故障机server2配置):

    [root@server05 ~]# cat /etc/masterha/app1.cnf 
    [server default]
    manager_log=/var/log/masterha/app1/manager.log
    manager_workdir=/var/log/masterha/app1
    master_binlog_dir=/var/lib/mysql
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=123456
    ping_interval=1
    remote_workdir=/tmp
    repl_password=123456
    repl_user=repl
    user=root
    
    [server1]
    hostname=server01
    port=3306
    
    [server3]
    hostname=server03
    port=3306
    
    [server4]
    hostname=server04
    port=3306

    Server05(192.168.200.115) 监控日志:

    -----------------------------------忽略部分信息-----------------------------------
    
    Selected server01 as a new master.
    server01: OK: Applying all logs succeeded.
    server01: OK: Activated master IP address.
    server03: This host has the latest relay log events.
    server04: This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    server04: OK: Applying all logs succeeded. Slave started, replicating from server01.
    server03: OK: Applying all logs succeeded. Slave started, replicating from server01.
    server01: Resetting slave info succeeded.
    Master failover to server01(192.168.200.111:3306) completed successfully.

    修复server02(192.168.200.112)主机,指向新的主库:

    [root@server02 ~]# systemctl start mariadb
    [root@server02 ~]# netstat -lnpt | grep :3306
    tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      5982/mysqld 
    [root@server02 ~]# mysql -u root -p123456
    
    stop slave;
    
    CHANGE MASTER TO
    MASTER_HOST='192.168.200.111',
    MASTER_USER='repl',
    MASTER_PASSWORD='123456';
    
    start slave;
    
    show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.200.111
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: master-bin.000002
              Read_Master_Log_Pos: 923
                   Relay_Log_File: mariadb-relay-bin.000003
                    Relay_Log_Pos: 530
            Relay_Master_Log_File: master-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

    Server05(192.168.200.115)修改监控配置文件添加server2配置:

    [root@server05 ~]# vim /etc/masterha/app1.cnf
    [server2]
    hostname=server02
    candidate_master=1
    port=3306
    check_repl_delay=0

    Server05(192.168.200.115)检查集群状态:

    [root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
    server01 (current master)
     +--server02
     +--server03
     +--server04
    
    -----------------------------------忽略部分信息-----------------------------------
    
    MySQL Replication Health is OK.

    server02修复好了,并且server01成为了新的zhu

    ssh server01

    ssh server02

    ssh server03

    ssh server04

    check_repl_delay=0

  • 相关阅读:
    JS学习笔记ZT
    一条经典的汇总的sql
    sql 日期的转换
    微软.net安装失败时出现的问题
    sql 换行
    js 代码
    学习笔记
    decimal 的使用
    功能最完善,代码最简洁的选项卡代码(div+css)
    sql字母排序
  • 原文地址:https://www.cnblogs.com/tanxiaojuncom/p/11722572.html
Copyright © 2020-2023  润新知