• MySQL 并行复制(MTS) 从库更新的记录不存在实际却存在





    背景

    开了并行复制的半同步从库SQL 线程报1032错误,异步复制从库没有报错,偶尔会出现这种

    版本

    mysql 5.7.16
    redhat 6.8
    mysql> show variables like '%slave_para%';
    +------------------------+---------------+
    | Variable_name | Value |
    +------------------------+---------------+
    | slave_parallel_type | LOGICAL_CLOCK |
    | slave_parallel_workers | 16 |
    +------------------------+---------------+

    分析

    1、疑是对从库执行了更新操作,导致更新的记录不存在
    2、查看error log发现

    2018-04-03T10:11:47.720156+08:00 16 [ERROR] Slave SQL for channel '': **Worker 13** failed executing transaction **'a272bbcf-874f-11e7-a288-00505695b721:687871861**' at master log mysql-bin.004119, end_log_pos 376471678; **Could not execute Update_rows event** on table anytxn.seq_xxxx; Can't find record in 'seq_xxxx', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.004119, end_log_pos 376471678, Error_code: 1032
    
    2018-04-03T10:11:47.720230+08:00 2 [Warning] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
    2018-04-03T10:11:47.720959+08:00 2 [Note] Error reading relay log event for channel '': slave **SQL thread was killed**
    

    3、从 SQL线程停止的position分析binlog发现

    SET @@SESSION.GTID_NEXT= 'a272bbcf-874f-11e7-a288-00505695b721:687871861'/*!*/;
    # at 376471694
    #180403 10:11:47 server id 104073  end_log_pos 376471555 CRC32 0x1be91176 	Query	thread_id=2086049	exec_time=0	error_code=0
    SET TIMESTAMP=1522721507/*!*/;
    BEGIN
    /*!*/;
    # at 376471768
    #180403 10:11:47 server id 104073  end_log_pos 376471616 CRC32 0x10644d77 	Table_map: `anytxn`.`seq_xxxx` mapped to number 301
    # at 376471829
    #180403 10:11:47 server id 104073  end_log_pos 376471678 CRC32 0x871a9787 	Update_rows: table id 301 flags: STMT_END_F
    
    ### UPDATE `anytxn`.`seq_xxxx`
    ### WHERE
    ###   @1=7116088 /* LONGINT meta=0 nullable=0 is_null=0 */
    ###   @2=1 /* INT meta=0 nullable=0 is_null=0 */
    ### SET
    ###   @1=7116089 /* LONGINT meta=0 nullable=0 is_null=0 */
    ###   @2=1 /* INT meta=0 nullable=0 is_null=0 */
    # at 376471891
    #180403 10:11:47 server id 104073  end_log_pos 376471709 CRC32 0x9eb59238 	Xid = 22247621418
    COMMIT/*!*/;
    # at 376471922
    #180403 10:11:47 server id 104073  end_log_pos 376471774 CRC32 0xf7b6ad5d 	GTID	last_committed=641254	sequence_number=641259
    SET @@SESSION.GTID_NEXT= 'a272bbcf-874f-11e7-a288-00505695b721:687871862'/*!*/;
    # at 376471987
    #180403 10:11:47 server id 104073  end_log_pos 376471856 CRC32 0x6256de00 	Query	thread_id=2085350	exec_time=0	error_code=0
    SET TIMESTAMP=1522721507/*!*/;
    BEGIN
    /*!*/;
    # at 376472069
    #180403 10:11:47 server id 104073  end_log_pos 376471979 CRC32 0x6c329578 	Table_map: `anytxn`.`bm_cc_customer_address_info` mapped to number 1569
    # at 376472192
    #180403 10:11:47 server id 104073  end_log_pos 376472162 CRC32 0x834cc8b9 	Write_rows: table id 1569 flags: STMT_END_F
    
    
    ### INSERT INTO `anytxn`.`bm_xxxxxxxxxxxxxx`
    ### SET
    ###   @1=14480779 /* LONGINT meta=0 nullable=0 is_null=0 */
    ###   @2='0000001002380654' /* STRING(96) meta=65120 nullable=0 is_null=0 */
    ###   @3='B001' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
    ###   @4=NULL /* STRING(12) meta=65036 nullable=1 is_null=1 */
    ###   @5='10000010001202000000001' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
    ###   @6='B00' /* STRING(9) meta=65033 nullable=1 is_null=0 */
    ###   @7='xxxxxxxxxxx' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
    ###   @8=NULL /* STRING(18) meta=65042 nullable=1 is_null=1 */
    ###   @9=NULL /* STRING(18) meta=65042 nullable=1 is_null=1 */
    ###   @10=NULL /* STRING(18) meta=65042 nullable=1 is_null=1 *
    
     mysql@xxxxxx:/home/mysql>  mysqlbinlog -v -v --start-datetime='2018-04-03 10:11:45' --stop-datetime='2018-04-03 10:11:48'  /data/mysql/xxxxx-relay-bin.005477 | grep last_comm | grep 10:11:47 | grep 641254
    #180403 10:11:47 server id 104073  end_log_pos 376469618 CRC32 0xb6dc6cef 	GTID	last_committed=641227	sequence_number=641254
    #180403 10:11:47 server id 104073  end_log_pos 376471774 CRC32 0xf7b6ad5d 	GTID	last_committed=641254	sequence_number=641259
    #180403 10:11:47 server id 104073  end_log_pos 376472258 CRC32 0x27cf3013 	GTID	last_committed=641254	sequence_number=641260
    

    从上面信息可以看出,发生更新记录不存在是在更新anytxn.xxxx_id表的标识为7116088 的记录
    有两个并发提交的事务last_committed=641254 ,与发现更新的记录不存在的 GTID *.687871861 事务还有另一个并发提交的事务 sequence_number=641260(即insert另一张表的操作),难道是master有并发提交的事务,slave多个work线程去apply的时候出现了问题?

    4、查看更新的记录不存在的表和相关记录

    show create table seq_xxxx;
    | seq_xxxx | CREATE TABLE seq_xxxx (
    currentValue bigint(20) NOT NULL,
    increment int(11) NOT NULL DEFAULT '1'

    mysql> select * from seq_xxxx;
    +--------------+-----------+
    | currentValue | increment |
    +--------------+-----------+
    | 7116088 | 1 |
    +--------------+-----------+
    可以发现实际数据库中是存在该记录的

    测试

    mysql> select @@version;
    +------------+
    | @@version |
    +------------+
    | 5.7.19-log |
    +------------+
    1 row in set (0.00 sec)
    mysql> show variables like '%para%';
    +------------------------+---------------+
    | Variable_name | Value |
    +------------------------+---------------+
    | slave_parallel_type | LOGICAL_CLOCK |
    | slave_parallel_workers | 4 |
    +------------------------+---------------+

    sysbench /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=10.186.30.73 --mysql-socket=/opt/mysql/data/3307/mysqld.sock --mysql-port=3307  --db-driver=mysql  --mysql-db=test --mysql-user=admin --mysql-password=admin --table_size=100000 --tables=5 --threads=100 --time=120 --report-interval=5  run
    

    有并发提交的事务,但没有模拟重现出更新的记录不存在,但在库中却存在的情况


    更新


    slave_preserve_commit_order=1(default 0,MTS中有效)

    确保事务以与在从服务器中继日志中显示的顺序相同的顺序在从服务器上apply,防止出现gaps。如果该值设置为0,slave并行apply事务可能会无序,检查最近执行的事务并不能保证主服务器上的所有先前事务都已在从服务器上执行
    开启多线程复制后,这个参数推荐设置为1

    参考:
    slave_preserve_commit_order

    相关bug链接
    Repeated multi-threaded slave replication failures

  • 相关阅读:
    kafka重新启动时出现:found a corrupted index file due to requirement failed问题解决方法
    filebeat向kafka中传输数据报WARN Failed to connect to broker DOMSDev07:9092: dial tcp: lookup DOMSDev07: getaddrinfow: No such host is known.解决方法
    安装Zookeeper出现Unable to start AdminServer,existing abnormally问题解决方法
    Windows Server 2008 R2 安装WinDbg以及符号路径设置
    Windows安装ElastAlert问题总结
    Windows系统下Log4Net+FileBeat+ELK日志分析系统问题总结
    Anaconda和basemap的安装和使用
    temp
    Mininet-wifi典型用法
    mininet-wifi 创建ryu和ofsoftswitch的拓扑流程
  • 原文地址:https://www.cnblogs.com/YangJiaXin/p/8882665.html
Copyright © 2020-2023  润新知