群里好几位同学问 pt-table-checksum 3.0.4, 主从两个表数据是不一致,为啥检测不出来?前段时间自己也测试过,只是没整理成随笔^_-
一、基本环境
VMware10.0+CentOS6.9+MySQL5.7.19
ROLE | HOSTNAME | BASEDIR | DATADIR | IP | PORT |
Master | ZST1 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.132 | 3306 |
Slave | ZST2 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.133 | 3306 |
基于Row+Gtid搭建的一主一从复制结构:Master->Slave
二、构造差异数据
借助样例数据库sakila做测试
# 主库flush logs mydba@192.168.85.132,3306 [sakila]> flush logs; # 主库开启general_log [root@ZST1 ~]# rm -rf /data/mysql/mysql3306/data/mysql-general.log mydba@192.168.85.132,3306 [sakila]> set global general_log_file='/data/mysql/mysql3306/data/mysql-general.log'; mydba@192.168.85.132,3306 [sakila]> set global general_log =1; mydba@192.168.85.132,3306 [sakila]> show variables like 'general_log%'; # 从库修改部分数据,造成不一致 mydba@192.168.85.133,3306 [sakila]> delete from sakila.actor where actor_id<=3; # 外键约束删除失败 mydba@192.168.85.133,3306 [sakila]> update sakila.actor set last_name=first_name where actor_id<=3; # 主库sakila.actor数据 mydba@192.168.85.132,3306 [sakila]> select * from sakila.actor limit 3; +----------+------------+-----------+---------------------+ | actor_id | first_name | last_name | last_update | +----------+------------+-----------+---------------------+ | 1 | PENELOPE | GUINESS | 2006-02-15 04:34:33 | | 2 | NICK | WAHLBERG | 2006-02-15 04:34:33 | | 3 | ED | CHASE | 2006-02-15 04:34:33 | +----------+------------+-----------+---------------------+ # 从库sakila.actor数据 mydba@192.168.85.133,3306 [sakila]> select * from sakila.actor limit 3; +----------+------------+-----------+---------------------+ | actor_id | first_name | last_name | last_update | +----------+------------+-----------+---------------------+ | 1 | PENELOPE | PENELOPE | 2017-11-08 09:54:10 | | 2 | NICK | NICK | 2017-11-08 09:54:10 | | 3 | ED | ED | 2017-11-08 09:54:10 | +----------+------------+-----------+---------------------+
从库修改部分数据,造成主从不一致
三、pt-table-checksum
3.1、检测数据是否一致
pt-table-checksum可以在任何机器上执行,只要它能连接到Master就行。我是在从库执行,最后的参数指定到主库就行
# 运行pt-table-checksum [root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 11-08T09:57:01 0 0 200 1 0 0.075 sakila.actor 11-08T09:57:01 0 0 600 1 0 0.034 sakila.city [root@ZST2 ~]#
DIFFS=0表示没有差异数据。实际上主从数据不一致,这里却没有检测出来~
主库得到的general-log、binlog拷贝到其他文件夹,方便后续分析
# 拷贝general-log、binlog文件 [root@ZST1 ~]# cp /data/mysql/mysql3306/data/mysql-general.log /data/backup/mysql-general.log.ptchecksum3306 [root@ZST1 ~]# cp /data/mysql/mysql3306/logs/mysql-bin.000083 /data/backup/mysql-bin.000083.ptchecksum3306
3.2、查看general-log
[root@ZST1 ~]# cat /data/backup/mysql-general.log.ptchecksum3306 /usr/local/mysql/bin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /tmp/mysql3306.sock Time Id Command Argument 2017-11-08T01:57:01.750917Z 20 Connect mydba@192.168.85.133 on using TCP/IP 2017-11-08T01:57:01.751564Z 20 Query set autocommit=1 2017-11-08T01:57:01.752220Z 20 Query SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout' 2017-11-08T01:57:01.757028Z 20 Query SET SESSION innodb_lock_wait_timeout=1 2017-11-08T01:57:01.757521Z 20 Query SHOW VARIABLES LIKE 'wait\_timeout' 2017-11-08T01:57:01.760950Z 20 Query SET SESSION wait_timeout=10000 2017-11-08T01:57:01.761400Z 20 Query SELECT @@SQL_MODE 2017-11-08T01:57:01.761772Z 20 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/ 2017-11-08T01:57:01.762140Z 20 Query SELECT @@server_id /*!50038 , @@hostname*/ 2017-11-08T01:57:01.762475Z 20 Query SELECT @@SQL_MODE 2017-11-08T01:57:01.762772Z 20 Query SET SQL_MODE=',NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION' 2017-11-08T01:57:01.763098Z 20 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2017-11-08T01:57:01.763504Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.766949Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.767470Z 20 Query SHOW SLAVE HOSTS 2017-11-08T01:57:01.787329Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.790712Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.794388Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.797637Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.801356Z 20 Query SHOW DATABASES LIKE 'sakila' 2017-11-08T01:57:01.802164Z 20 Query CREATE DATABASE IF NOT EXISTS `sakila` /* pt-table-checksum */ 2017-11-08T01:57:01.802951Z 20 Query USE `sakila` 2017-11-08T01:57:01.803300Z 20 Query SHOW TABLES FROM `sakila` LIKE 'checksums' 2017-11-08T01:57:01.806111Z 20 Query CREATE TABLE IF NOT EXISTS `sakila`.`checksums` ( db CHAR(64) NOT NULL, tbl CHAR(64) NOT NULL, chunk INT NOT NULL, chunk_time FLOAT NULL, chunk_index VARCHAR(200) NULL, lower_boundary TEXT NULL, upper_boundary TEXT NULL, this_crc CHAR(40) NOT NULL, this_cnt INT NOT NULL, master_crc CHAR(40) NULL, master_cnt INT NULL, ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (db, tbl, chunk), INDEX ts_db_tbl (ts, db, tbl) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 2017-11-08T01:57:01.825908Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.828793Z 20 Query SELECT CONCAT(@@hostname, @@port) 2017-11-08T01:57:01.844001Z 20 Query SELECT CRC32('test-string') 2017-11-08T01:57:01.844518Z 20 Query SELECT CRC32('a') 2017-11-08T01:57:01.845025Z 20 Query SELECT CRC32('a') 2017-11-08T01:57:01.845517Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.849157Z 20 Query SHOW DATABASES 2017-11-08T01:57:01.850038Z 20 Query SHOW /*!50002 FULL*/ TABLES FROM `sakila` 2017-11-08T01:57:01.851486Z 20 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 2017-11-08T01:57:01.851943Z 20 Query USE `sakila` 2017-11-08T01:57:01.852408Z 20 Query SHOW CREATE TABLE `sakila`.`actor` 2017-11-08T01:57:01.853034Z 20 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 2017-11-08T01:57:01.854092Z 20 Query EXPLAIN SELECT * FROM `sakila`.`actor` WHERE 1=1 2017-11-08T01:57:01.857374Z 20 Query USE `sakila` 2017-11-08T01:57:01.857990Z 20 Query DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' 2017-11-08T01:57:01.877626Z 20 Query USE `sakila` 2017-11-08T01:57:01.878413Z 20 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*explain checksum table*/ 2017-11-08T01:57:01.879347Z 20 Query REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'actor', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*checksum table*/ 2017-11-08T01:57:01.881166Z 20 Query SHOW WARNINGS 2017-11-08T01:57:01.881764Z 20 Query SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' 2017-11-08T01:57:01.897051Z 20 Query UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' 2017-11-08T01:57:01.900914Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.930534Z 20 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 2017-11-08T01:57:01.931387Z 20 Query USE `sakila` 2017-11-08T01:57:01.932194Z 20 Query SHOW CREATE TABLE `sakila`.`city` 2017-11-08T01:57:01.933399Z 20 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 2017-11-08T01:57:01.935136Z 20 Query EXPLAIN SELECT * FROM `sakila`.`city` WHERE 1=1 2017-11-08T01:57:01.940169Z 20 Query USE `sakila` 2017-11-08T01:57:01.941026Z 20 Query DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city' 2017-11-08T01:57:01.942010Z 20 Query USE `sakila` 2017-11-08T01:57:01.943012Z 20 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*explain checksum table*/ 2017-11-08T01:57:01.945033Z 20 Query REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'city', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*checksum table*/ 2017-11-08T01:57:01.960088Z 20 Query SHOW WARNINGS 2017-11-08T01:57:01.960938Z 20 Query SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1' 2017-11-08T01:57:01.961674Z 20 Query UPDATE `sakila`.`checksums` SET chunk_time = '0.015889', master_crc = '4d700c4', master_cnt = '600' WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1' 2017-11-08T01:57:01.964712Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.972503Z 20 Quit [root@ZST1 ~]#
general-log逻辑
• 设置SESSION选项
• 创建checksums数据表
• 针对每一张需要check的表执行下面操作
• DELETE:从checksums表中删除sakila的记录
• EXPLAIN:分析计算sakila的this_cnt,this_crc的执行计划
• REPLACE INTO:计算sakila的this_cnt,this_crc
• UPDATE:使用this_cnt,this_crc更新master_crc,master_cnt
在主库上这些以SQL语句的形式执行,且执行时没有设置SESSION的日志格式为STATEMENT,主库的binlog_format='ROW',所以binlog里记录的是语句的最终执行结果(具体的数值,而非SQL语句)
3.3、查看binlog
[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306
binlog逻辑是:首先创建checksums数据表,然后delete->insert->update checksums 具体数值
主库上最后一个update语句
SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
在binlog体现为(原封不动应用到从库)
[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306 ... COMMIT/*!*/; # at 1604 #171108 9:57:01 server id 1323306 end_log_pos 1669 CRC32 0x16bf0702 GTID last_committed=3 sequence_number=4 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '8ab82362-9c37-11e7-a858-000c29c1025c:575'/*!*/; # at 1669 #171108 9:57:01 server id 1323306 end_log_pos 1743 CRC32 0xee6b7639 Query thread_id=20 exec_time=0 error_code=0 SET TIMESTAMP=1510106221/*!*/; BEGIN /*!*/; # at 1743 #171108 9:57:01 server id 1323306 end_log_pos 1823 CRC32 0x589cc01f Table_map: `sakila`.`checksums` mapped to number 248 # at 1823 #171108 9:57:01 server id 1323306 end_log_pos 1950 CRC32 0xc0604f63 Update_rows: table id 248 flags: STMT_END_F ### UPDATE `sakila`.`checksums` ### WHERE ### @1='sakila' ### @2='actor' ### @3=1 ### @4=NULL ### @5=NULL ### @6=NULL ### @7=NULL ### @8='6816983c' ### @9=200 ### @10=NULL ### @11=NULL ### @12=1510106221 ### SET ### @1='sakila' ### @2='actor' ### @3=1 ### @4=0.001821 ### @5=NULL ### @6=NULL ### @7=NULL ### @8='6816983c' ### @9=200 ### @10='6816983c' ### @11=200 ### @12=1510106221 # at 1950 #171108 9:57:01 server id 1323306 end_log_pos 1981 CRC32 0x1f197fe6 Xid = 198 COMMIT/*!*/; # at 1981
也就是说从库不会去计算所谓的CRC32,它直接完整copy主库的checksums的所有内容
3.4、如何解决
个人认为只有在statement格式下才能进行,因为两边要计算CRC32,计算完后再把主上的master_crc、master_cnt更新到从库,最后在从库对比master和this相关列。pt-table-checksum 3.0.4在执行时缺少SET @@binlog_format='STATEMENT',建议不要使用。
有一种很挫的方法,仅仅是为了看差异结果(生产环境勿用),执行pt-table-checksum前,在主上 set global binlog_format='STATEMENT';
# 主库修改binlog_format为statement mydba@192.168.85.132,3306 [sakila]> set global binlog_format='STATEMENT'; # 从库运行pt-table-checksum [root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 11-08T12:40:27 0 1 200 1 0 0.015 sakila.actor 11-08T12:40:27 0 0 600 1 0 0.024 sakila.city [root@ZST2 ~]#
DIFFS=1,说明sakila.actor表存在差异
# 差异信息 mydba@192.168.85.133,3306 [sakila]> SELECT db,tbl,SUM(this_cnt) AS total_rows,COUNT(*) AS chunks FROM sakila.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db,tbl; +--------+-------+------------+--------+ | db | tbl | total_rows | chunks | +--------+-------+------------+--------+ | sakila | actor | 200 | 1 | +--------+-------+------------+--------+ 1 row in set (0.00 sec)
主要就是查看master_cnt、this_cnt和master_crc、this_crc
四、pt-table-sync
4.1、修复数据不一致
前面已经检测出主从数据不一致,下面使用pt-table-sync修复数据
# 打印命令 [root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --print REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('1', 'PENELOPE', 'GUINESS', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('2', 'NICK', 'WAHLBERG', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('3', 'ED', 'CHASE', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; [root@ZST2 ~]# # 执行命令 [root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --execute REPLACE statements on sakila.actor can adversely affect child table `sakila`.`film_actor` because it has an ON UPDATE CASCADE foreign key constraint. See --[no]check-child-tables in the documentation for more information. --check-child-tables error while doing sakila.actor on 192.168.85.133 [root@ZST2 ~]#
--execute就是执行打印出来的命令,REPLACE INTO实际对应delete、insert操作,由于外键约束delete失败(构造差异数据时就尝试过delete),修复不成功。
pt-table-checksum及pt-table-sync详细说明请参考:pt-table-checksum解读、使用pt-table-checksum及pt-table-sync校验复制一致性