• 案例:推进GTID解决MySQL主主不同步问题


    之前文章介绍过MySQL修改lower_case_table_names参数,如果之前大写存储的表将无法识别,需要特殊处理。
    最近遇到一例应用开发人员在修改这个参数之后,为了清除之前大写存储的表,做了误操作,导致主主不同步。

    1.故障现象模拟

    在lower_case_table_names=0时创建了测试库test和表TT:
    root@mysqldb 22:43:  [(none)]> create database test;
    Query OK, 1 row affected (0.01 sec)
    
    root@mysqldb 22:43:  [(none)]> use test;
    Database changed
    root@mysqldb 22:43:  [test]> create table TT(id int);
    Query OK, 0 rows affected (0.07 sec)
    
    root@mysqldb 22:43:  [test]> show tables;
    +----------------+
    | Tables_in_test |
    +----------------+
    | TT             |
    +----------------+
    1 row in set (0.00 sec)
    

    在修改lower_case_table_names=1时删除TT不成功:

    root@mysqldb 22:27:  [test]> drop table TT;
    ERROR 1051 (42S02): Unknown table 'test.tt'
    

    此时误操作来了。。据这样操作的人员反馈,是直接在网络搜索到这个错误就是要到OS层面去删除表的文件,然后就做了
    我这里也按照这个误操作在测试环境来模拟下:

    [root@test01 test]# rm TT.*
    rm: remove regular file `TT.frm'? y
    rm: remove regular file `TT.ibd'? y
    

    而且后续根据故障现象推测:操作人员最初只在一个主节点做了这样的操作,随后在这个主节点执行了删除数据库的动作,最后又建立了新的数据库重新建表,最终才发现另一个主节点已经不同步了,尝试自己无法解决后,上报了故障给客户DBA。
    此刻现象就是:Master1 删除数据库成功后,但Master2 同步报错1010,内容是删除数据库发生错误,具体如下:

    root@mysqldb 23:04:  [test]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.121
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mybinlog.000013
              Read_Master_Log_Pos: 756
                   Relay_Log_File: test02-relay-bin.000034
                    Relay_Log_Pos: 532
            Relay_Master_Log_File: mybinlog.000013
                 Slave_IO_Running: Yes
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 1010
                       Last_Error: Error 'Error dropping database (can't rmdir './test', errno: 39)' on query. Default database: 'test'. Query: 'drop database test'
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 601
                  Relay_Log_Space: 1060
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 1010
                   Last_SQL_Error: Error 'Error dropping database (can't rmdir './test', errno: 39)' on query. Default database: 'test'. Query: 'drop database test'
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1121
                      Master_UUID: 08c887bf-98ab-11ea-b70c-080027c2997a
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 200702 23:04:11
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 08c887bf-98ab-11ea-b70c-080027c2997a:549-550
                Executed_Gtid_Set: 08c887bf-98ab-11ea-b70c-080027c2997a:5-549,
    5d3f3359-98ab-11ea-8101-080027763d24:1-13
                    Auto_Position: 0
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    root@mysqldb 23:04:  [test]> q
    

    2.推进GTID解决

    这时就可以用一个空事物将当前执行报错的GTID(Global Transaction Identifier)给跳过去:
    set gtid_next='$Master_UUID:$gno';
    begin;
    commit;
    set gtid_next=automatic;
    start slave;
    

    这里实际就是选取Master_UUID: 08c887bf-98ab-11ea-b70c-080027c2997agno:550(因为Executed_Gtid_Set最后是549,当前报错对应应该是549/550,期望用空事物代替跳过)
    注意:这里的gno是连续的。第一次我尝试gtid_next='08c887bf-98ab-11ea-b70c-080027c2997a:549'是不成功的,所以又尝试550:

    set gtid_next='08c887bf-98ab-11ea-b70c-080027c2997a:550';
    begin;
    commit;
    set gtid_next=automatic;
    start slave;
    

    这次执行后再次查看slave状态,确认已恢复正常:

    root@mysqldb 23:11:  [(none)]> set gtid_next='08c887bf-98ab-11ea-b70c-080027c2997a:550';
    Query OK, 0 rows affected (0.00 sec)
    
    root@mysqldb 23:11:  [(none)]> begin;
    Query OK, 0 rows affected (0.00 sec)
    
    root@mysqldb 23:11:  [(none)]> commit;
    Query OK, 0 rows affected (0.00 sec)
    
    root@mysqldb 23:11:  [(none)]> set gtid_next=automatic;
    Query OK, 0 rows affected (0.00 sec)
    
    root@mysqldb 23:11:  [(none)]> start slave;
    Query OK, 0 rows affected (0.01 sec)
    
    root@mysqldb 23:11:  [(none)]> show slave statusG
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.121
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mybinlog.000013
              Read_Master_Log_Pos: 951
                   Relay_Log_File: test02-relay-bin.000034
                    Relay_Log_Pos: 687
            Relay_Master_Log_File: mybinlog.000013
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 951
                  Relay_Log_Space: 1060
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1121
                      Master_UUID: 08c887bf-98ab-11ea-b70c-080027c2997a
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 08c887bf-98ab-11ea-b70c-080027c2997a:549-550
                Executed_Gtid_Set: 08c887bf-98ab-11ea-b70c-080027c2997a:5-550,
    5d3f3359-98ab-11ea-8101-080027763d24:1-14
                    Auto_Position: 0
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    root@mysqldb 23:11:  [(none)]> 
    

    当然Master2遗留的这个test库记得要处理掉,不然以后还会有问题隐患。

  • 相关阅读:
    js面试题-----算法类
    js面试题-----安全类
    js面试题-----通信类
    js面试题-----面向对象类
    js面试题-----HTTP协议类
    js面试题-----CSS盒模型
    Java-JVM 类的初始化
    加密的相关基础
    AngularJS-directive.js 基本指令
    AngularJS-liveRoomDirective.js 直播间指令
  • 原文地址:https://www.cnblogs.com/jyzhao/p/13228000.html
Copyright © 2020-2023  润新知