• MySQL5.7(5.6)GTID环境下恢复从库思(qi)路(yin)方(ji)法(qiao)


     
    要讨论如何恢复从库,我们得先来了解如下一些概念:
    GTID_EXECUTED:它是一组包含已经记录在二进制日志文件中的事务集合
    GTID_PURGED:它是一组包含已经从二进制日志删除掉的事务集合。
     
     
    在继续讨论时,我们先来看下如何新建一个基于GTID的slave。
    通过了解上面的两个参数,我们现在只需要:
    1.从主库上做一个备份时记录备份时gtid_executed的值。
    2.在新的slave上恢复此备份时设置从库的gtid_purged的值为备份时master上gtid_executed的值。
     
    通过mysqldump可以完成我们需要的功能。
     
     
    目前主库上的状态(3301):
    [zejin] 3301>show global variables like 'gtid_executed';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |
    +---------------+-------------------------------------------+
    1 row in set (0.00 sec)
     
    [zejin] 3301>show global variables like 'gtid_purged';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-13 |
    +---------------+-------------------------------------------+
    1 row in set (0.00 sec)
     
     
    step1:用mysqldump做一个全备
    mysqldump --all-databases --single-transaction --triggers --routines --events --host=127.0.0.1 --port=3301 --user=root --password=123 > dump3301.sql
     
    打开dump3301.sql我们可以看到如下语句:
    SET @@GLOBAL.GTID_PURGED='a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15';
    此值即为master3301上gtid_executed的值。
     
    step2:全新启动一个新的库3303,注意在配置文件中配置enforce_gtid_consistency及gtid_mode=on
    mysqld_safe --defaults-file=/home/mysql/my3303.cnf &
    此时新库3303上的状态应该是这样的:
    
    [(none)] 3303>show global variables like 'gtid_executed';
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | gtid_executed | |
    +---------------+-------+
    1 row in set (0.01 sec)
     
    [(none)] 3303>show global variables like 'gtid_purged';
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | gtid_purged | |
    +---------------+-------+
    1 row in set (0.00 sec)
     
    step3:导入备份文件并查看状态值:
    mysql -uroot -h127.0.0.1 -p123 -P3303 < dump3301.sql
    [(none)] 3303>show global variables like 'gtid_executed';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |
    +---------------+-------------------------------------------+
    1 row in set (0.02 sec)
     
    [(none)] 3303>show global variables like 'gtid_purged';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |
    +---------------+-------------------------------------------+
    1 row in set (0.00 sec)
     
     
    step4:做主从change语句
    [zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;
    Query OK, 0 rows affected, 2 warnings (0.01 sec)
     
    [zejin] 3303>start slave;
    Query OK, 0 rows affected (0.00 sec)
    
    [zejin] 3303>show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.240
                      Master_User: repl
                      Master_Port: 3301
                    Connect_Retry: 60
                  Master_Log_File: binlog57.000014
              Read_Master_Log_Pos: 194
                   Relay_Log_File: zejin240-relay-bin.000002
                    Relay_Log_Pos: 365
            Relay_Master_Log_File: binlog57.000014
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 194
                  Relay_Log_Space: 575
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 3301
                      Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
                 Master_Info_File: /home/mysql/I3303/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 
                Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
     
    至此完成了加入一台新的slave的GTID主从环境。
     
     
    假如我们目前拥有一主带两从的环境:
    master(3301)
    slave(3302)
    slave(3303)
     
    我们来考虑这么一种异常情况,由于种种原因,有可能主库上已经purge掉了一些binlog,但从库都还没有接收到(如slave停了一段时间,而master已经把一些binlog给purge掉了。)
     
    主库目前的状态是:
    [zejin] 3301>show global variables like 'gtid_executed';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21 |
    +---------------+-------------------------------------------+
    1 row in set (0.00 sec)
     
    [zejin] 3301>show global variables like 'gtid_purged';
    +---------------+-------------------------------------------+
    | Variable_name | Value |
    +---------------+-------------------------------------------+
    | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-20 |
    +---------------+-------------------------------------------+
    1 row in set (0.00 sec)
     
    [zejin] 3301>select * from t_users;
    +----+------+
    | id | name |
    +----+------+
    | 1 | chen |
    | 2 | ok |
    | 3 | li |
    +----+------+
    3 rows in set (0.00 sec)
    在从库3303上,我们可以看到如下错误提示:
    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
    [zejin] 3303>show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 192.168.1.240
                      Master_User: repl
                      Master_Port: 3301
                    Connect_Retry: 60
                  Master_Log_File: binlog57.000014
              Read_Master_Log_Pos: 457
                   Relay_Log_File: zejin240-relay-bin.000003
                    Relay_Log_Pos: 4
            Relay_Master_Log_File: binlog57.000014
                 Slave_IO_Running: No
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 457
                  Relay_Log_Space: 194
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 1236
                    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 3301
                      Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
                 Master_Info_File: /home/mysql/I3303/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 160809 17:25:39
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:16
                Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-16
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    
    [zejin] 3303>select * from t_users;
    +----+------+
    | id | name |
    +----+------+
    |  1 | li   |
    |  2 | zhou |
    +----+------+
    2 rows in set (0.00 sec)
    主从已经中断,数据也已不一致。
     
    接下来我们来看如何恢复:
    由于GTID具有全局唯一性,那么其它正常的gtid已经被复制到了slave3302上,所以我们可以把3303指向3302,同步完毕后再指回master3301(此前提基于3302的binlog还没被purge掉,即存在3303没有从master3301接收到的GTID事务)
    操作方法如下:
    [zejin] 3303>change master to master_host='192.168.1.240',master_port=3302,master_user='repl',master_password='123',master_auto_position=1;
    
    [zejin] 3303>start slave;
    Query OK, 0 rows affected (0.03 sec)
    
    [zejin] 3303>show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.240
                      Master_User: repl
                      Master_Port: 3302
                    Connect_Retry: 60
                  Master_Log_File: binlog57.000007
              Read_Master_Log_Pos: 1723
                   Relay_Log_File: zejin240-relay-bin.000002
                    Relay_Log_Pos: 1687
            Relay_Master_Log_File: binlog57.000007
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 1723
                  Relay_Log_Space: 1937
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 3302
                      Master_UUID: 5cee6f9f-5ab8-11e6-a081-000c29d4dc3f
                 Master_Info_File: /home/mysql/I3303/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:17-21
                Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    [zejin] 3303>select * from t_users;
    +----+------+
    | id | name |
    +----+------+
    |  1 | chen |
    |  2 | ok   |
    |  3 | li   |
    +----+------+
    3 rows in set (0.00 sec)
    
    
    数据也已经完全与主的一致了,复制正常后再change到3301master上。
    [zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;
    Query OK, 0 rows affected, 2 warnings (0.01 sec)
    
    [zejin] 3303>start slave;
    Query OK, 0 rows affected (0.00 sec)
    上面这种情况是基于还有另一个从库已经接收到了master的所有binlog的情况下,那如果结果只是M-S,也发生了如上的问题,那又该如何恢复,我们有如下两种方法:
     
    目前Master上状态为:
    [zejin] 3301>show global variables like '%gtid%';
    +----------------------------------+-------------------------------------------+
    | Variable_name                    | Value                                     |
    +----------------------------------+-------------------------------------------+
    
    | gtid_executed                    | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27 |
    ……
    | gtid_purged                      | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-25 |
    ……
    +----------------------------------+-------------------------------------------+
    8 rows in set (0.00 sec)
    
     
    Slave上状态为:
    [zejin] 3303>show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 192.168.1.240
                      Master_User: repl
                      Master_Port: 3301
                    Connect_Retry: 60
                  Master_Log_File: binlog57.000016
              Read_Master_Log_Pos: 729
                   Relay_Log_File: zejin240-relay-bin.000003
                    Relay_Log_Pos: 4
            Relay_Master_Log_File: binlog57.000016
                 Slave_IO_Running: No
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 729
                  Relay_Log_Space: 194
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 1236
                    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 3301
                      Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
                 Master_Info_File: /home/mysql/I3303/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 160809 17:54:42
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22
                Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-22
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)

    和之前同样类型的错误,我们恢复的思路为:

    把slave上的gtid_purged设置为master还没有被purge掉的值,最后借助第三方一致性同步工具来做数据的一致性同步。
     
    我们需要先在slave上做一下reset master来清除gtid的一些信息,直接设置会报如下错误:
    [zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26";
    ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

    正确操作步骤如下(在slave上执行):

    [zejin] 3303>reset master;
    Query OK, 0 rows affected (0.02 sec)
    
    [zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26";
    Query OK, 0 rows affected (0.00 sec)
    
    [zejin] 3303>start slave;
    Query OK, 0 rows affected (0.00 sec)
    
    [zejin] 3303>show slave status G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.240
                      Master_User: repl
                      Master_Port: 3301
                    Connect_Retry: 60
                  Master_Log_File: binlog57.000018
              Read_Master_Log_Pos: 728
                   Relay_Log_File: zejin240-relay-bin.000004
                    Relay_Log_Pos: 718
            Relay_Master_Log_File: binlog57.000018
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 728
                  Relay_Log_Space: 968
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 3301
                      Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
                 Master_Info_File: /home/mysql/I3303/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22:27
                Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    
    当然执行完这个之后数据是不一致的,那么此时就可以通过pt-table-checksum和pt-table-sync来做数据的一致性恢复了。
     
     
    我们还有另一种方法,那就是重建slave,方法如本文最开始的那样新建一个slave,但是在由于目前slave上已经有gtid的一些信息,所以在恢复时得先在slave上reset master,具体操作如下:
    在slave上操作:
    reset master
    source dump3301.sql;
    change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;
    start slave;
    show slave statusG
    至此完成slave同步异常的恢复。
     
     
     
     
     
     
     
     
     
  • 相关阅读:
    关于locals()、globals()以及作用域的一些感悟
    Python中创建对象的方法
    Python之__loader__
    tag上、push上和pull 取Docker 映像
    制作Docker镜像
    在Docker Hub上查找可用的Image映像
    window下安装mysql
    linux下安装python3
    yun、apt、wget的区别
    红帽7 Squid部署代理服务
  • 原文地址:https://www.cnblogs.com/zejin2008/p/5753934.html
Copyright © 2020-2023  润新知