• pt-table-checksum


    pt-table-checksum是percona公司提供的一个用于在线比对主从数据一致性的工具。

    实现原理

    将一张大表分成多个chunk,每次针对一个chunk进行校验,同时将校验的结果通过REPLACE INTO语句写入到percona.checksums表中,然后该语句通过主从复制,在SLAVE中同样执行一次,校验的结果同样是写入到percona.checksums表中,最后,通过查询percona.checksums来获取主从不一致的信息。

    常见用法

    1. 基本用法

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123

    其中,monitor的最小权限如下(第二个权限是针对percona.checksums的):

    GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'monitor'@'192.168.244.10';

    GRANT ALL PRIVILEGES ON `percona`.* TO 'monitor'@'192.168.244.10';

    2. pt-table-checksum默认是运行在statement下,如果是其它日志格式,需加--no-check-binlog-format参数

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format

    3. 如果主从复制中加了过滤条件,譬如binlog_ignore_db或replicate_do_db之类的参数,需加--no-check-replication-filters参数

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format --no-check-replication-filters

       如果在对被过滤表进行校验时,命令hang住了,可加--replicate-database参数。

    4. 基于指定库的校验

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format --databases=test,test1

    5. 基于指定表的校验

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format --tables=test2.test

    其它具体用法,可参考另外一篇博客:pt-table-checksum参数详解

    通过打开general_log来看看其具体的执行过程,注意,测试表是test.test,共1000000条记录。

    17 Query    SHOW GLOBAL STATUS LIKE 'Threads_running'
    查看Threads_running变量是为了查看当前系统的负载情况
    17 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
    设置会话变量
    17 Query USE `test` 17 Query SHOW CREATE TABLE `test`.`test`
    查看test表的表结构,选取分片键,一般为主键或唯一索引
    17 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 17 Query EXPLAIN SELECT * FROM `test`.`test` WHERE 1=1
    查看test表的大概数量 17 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) ORDER BY `id` LIMIT 1 /*first lower boundary*/
    选择第一个chunk的下标,即id的最小值 17 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX (`PRIMARY`) WHERE `id` IS NOT NULL ORDER BY `id` LIMIT 1 /*key_len*/
    查看索引的长度 17 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`test` FORCE INDEX (`PRIMARY`) WHERE `id` >= '1' /*key_len*/
    查看实际使用的索引的长度,这个针对联合索引的场景。
    17 Query USE `percona` 17 Query DELETE FROM `percona`.`checksums` WHERE db = 'test' AND tbl = 'test'
    从percona.checksums表中删除之前的校验记录
    17 Query USE `test` 17 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 2171, 2 /*next chunk boundary*/
    17
    Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 2171, 2 /*next chunk boundary*/ 确认本chunk的上限,以及下一个chunk的下限。
    17 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ip`, `no`, CONCAT(ISNULL(`ip`), ISNULL(`no`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '2172')) /*explain checksum chunk*/ 查看对本次chunk执行checksum操作的执行计划,确认读取的行数是否合理,选择的索引是否合适
    17
    Query REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test', 'test', '1', 'PRIMARY', '1', '2172', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ip`, `no`, CONCAT(ISNULL(`ip`), ISNULL(`no`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '2172')) /*checksum chunk*/ 关键操作,对本次chunk执行checksum操作,并将结果更新到percona.checksums表中。
    17 Query SHOW WARNINGS 17 Query SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'test' AND tbl = 'test' AND chunk = '1'
    查看本次操作校验的行数和校验和 17 Query UPDATE `percona`.`checksums` SET chunk_time = '0.059603', master_crc = '568c1ba3', master_cnt = '2172' WHERE db = 'test' AND tbl = 'test' AND chunk = '1' 将上面那个查询得到的行数和校验和更新到master_cnt和master_crc中。这样的话,主库的校验和在从库执行replace操作时被覆盖。

    下面是针对第二个chunk执行的操作。 17 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 17 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '2173')) ORDER BY `id` LIMIT 18219, 2 /*next chunk boundary*/ 17 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '2173')) ORDER BY `id` LIMIT 18219, 2 /*next chunk boundary*/ 17 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ip`, `no`, CONCAT(ISNULL(`ip`), ISNULL(`no`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '2173')) AND ((`id` <= '20392')) /*explain checksum chunk*/ 17 Query REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test', 'test', '2', 'PRIMARY', '2173', '20392', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ip`, `no`, CONCAT(ISNULL(`ip`), ISNULL(`no`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '2173')) AND ((`id` <= '20392')) /*checksum chunk*/ 17 Query SHOW WARNINGS 17 Query SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'test' AND tbl = 'test' AND chunk = '2' 17 Query UPDATE `percona`.`checksums` SET chunk_time = '0.022960', master_crc = '83371365', master_cnt = '18220' WHERE db = 'test' AND tbl = 'test' AND chunk = '2' ... 下面的校验和上面的并不相同,上述id值的范围是1~1000000,下面两个chunk的范围是<1和>1000000,为什么要这么做呢?
    主要是考虑到从库有可能存在上述两个范围的数据。
    17 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 17 Query EXPLAIN SELECT COUNT(*), '0' FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` < '1')) ORDER BY `id` /*explain past lower chunk*/ 17 Query REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test', 'test', '9', 'PRIMARY', NULL, '1', COUNT(*), '0' FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` < '1')) ORDER BY `id` /*past lower chunk*/ 17 Query SHOW WARNINGS 17 Query SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'test' AND tbl = 'test' AND chunk = '9' 17 Query UPDATE `percona`.`checksums` SET chunk_time = '0.004492', master_crc = '0', master_cnt = '0' WHERE db = 'test' AND tbl = 'test' AND chunk = '9' 17 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 17 Query EXPLAIN SELECT COUNT(*), '0' FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` > '1000000')) ORDER BY `id` /*explain past upper chunk*/ 17 Query REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test', 'test', '10', 'PRIMARY', '1000000', NULL, COUNT(*), '0' FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` > '1000000')) ORDER BY `id` /*past upper chunk*/ 17 Query SHOW WARNINGS 17 Query SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'test' AND tbl = 'test' AND chunk = '10' 17 Query UPDATE `percona`.`checksums` SET chunk_time = '0.058622', master_crc = '0', master_cnt = '0' WHERE db = 'test' AND tbl = 'test' AND chunk = '10'

    输出结果说明

    # pt-table-checksum -h 192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format

                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    11-09T21:19:01      0      0        0       1       0   2.768 h2.h3
    11-09T21:19:06      0      0        0       1       0   3.903 hello.h1
    11-09T21:19:06      0      0        0       1       0   0.620 hello.h2
    11-09T21:19:10      0      0        0       1       0   1.725 mysql.columns_priv
    11-09T21:19:10      0      1        4       1       0   0.457 mysql.db
    11-09T21:19:11      0      0        0       1       0   0.306 mysql.event
    11-09T21:19:12      0      0        0       1       0   0.721 mysql.func
    11-09T21:19:13      0      0       40       1       0   0.649 mysql.help_category
    11-09T21:19:14      0      0      611       6       0   1.313 mysql.help_keyword
    11-09T21:19:18      0      0     1218       4       0   3.432 mysql.help_relation
    11-09T21:19:20      0      0      583       1       0   1.632 mysql.help_topic
    11-09T21:19:21      0      0        0       1       0   0.501 mysql.ndb_binlog_index
    11-09T21:19:21      0      0        0       1       0   0.319 mysql.plugin
    11-09T21:19:22      0      0        0       1       0   0.624 mysql.proc
    11-09T21:19:22      0      0        0       1       0   0.626 mysql.procs_priv
    11-09T21:19:23      0      0        2       1       0   0.375 mysql.proxies_priv
    11-09T21:19:24      0      0        0       1       0   0.806 mysql.servers
    11-09T21:19:24      0      1        3       1       0   0.292 mysql.tables_priv
    11-09T21:19:24      0      0        0       1       0   0.382 mysql.time_zone
    11-09T21:19:25      0      0        0       1       0   0.398 mysql.time_zone_leap_second
    11-09T21:19:25      0      0        0       1       0   0.386 mysql.time_zone_name
    11-09T21:19:25      0      0        0       1       0   0.393 mysql.time_zone_transition
    11-09T21:19:26      0      0        0       1       0   0.313 mysql.time_zone_transition_type
    11-09T21:19:26      0      1       18       1       0   0.393 mysql.user
    11-09T21:19:27      0      0        1       1       0   0.642 percona.dsns
    11-09T21:19:30      0      1       41       1       0   2.613 test.checksum
    11-09T21:19:31      0      0        0       1       0   0.669 test.ta
    11-09T21:20:13      0      1  1000000       7       0  42.009 test.test
    11-09T21:20:15      0      0        5       1       0   1.818 test1.test
    11-09T21:20:16      0      0        5       1       0   0.546 test1.test1
    11-09T21:20:17      0      1       12       1       0   0.508 test2.test
    11-09T21:20:17      0      1        0       1       0   0.387 test2.test2
    11-09T21:20:18      0      1     1000       1       0   0.707 test3.test3

    TS:校验完表后的时间戳

    ERRORS:校验过程中出现的errors和warnings的次数。

    DIFFS:所有SLAVE中checksum值不相同的chunk的数量,如一主两从中,SLAVE1的chunk1与MASTER的checksum不同,SLAVE2的chunk1和chunk2不相同,则DIFFS的值为2。如果SLAVE2的chunk2和chunk3不相同,则DIFFS为3。

    ROWS:表中校验的记录数。通常情况下为表的总行数。如果指定了--where选项,则为符合条件的记录数。

    CHUNKS:表被分割为多个chunk后,chunk的个数。

    SKIPPED:跳过的chunk的个数,通常因为如下原因:

    * MySQL not using the --chunk-index
    * MySQL not using the full chunk index (--[no]check-plan)
    * Chunk size is greater than --chunk-size * --chunk-size-limit
    * Lock wait timeout exceeded (--retries)
    * Checksum query killed (--retries)

    TIME:校验表所花费的时间。

    TABLE:校验的表名

    replicate-check-only参数下的输出结果说明 

    如果指定了--replicate-check-only参数,则意味着不会校验任何表,直接获取上次校验的结果。

    # pt-table-checksum -h 192.168.244.10 -umonitor -pmonitor123 --no-check-binlog-format --replicate-check-only

    Differences on hbase
    TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
    mysql.db 1 1 1   
    mysql.tables_priv 1 -3 1   
    mysql.user 1 -4 1   
    test.checksum 1 0 1   
    
    Differences on test
    TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
    mysql.db 1 -1 1   
    mysql.tables_priv 1 -1 1   
    mysql.user 1 -2 1   
    test.checksum 1 0 1   
    test.test 7 1000000 0 PRIMARY 1000000 
    test2.test 1 -10 1   
    test2.test2 1 1 1   
    test3.test3 1 2000 1   

    可以看出,它分别输出了不同SLAVE中的差异部分。

    TABLE:校验的表名。

    CHUNK:checksum值不相同的chunk的数量。

    CNT_DIFF:The number of chunk rows on the replica minus the number of chunk rows on the master.即SLAVE中被校验的记录数减去MASTER中的记录数。

    CRC_DIFF:1 if the CRC of the chunk on the replica is different than the CRC of the chunk on the master, else 0.如果校验值相同,则CRC_DIFF为0,否则为1。

    上述test.test中CRC_DIFF中crc为0的原因是SLAVE比MASTER多1000000条记录,且这1000000条记录正好又是在最后一个chunk中(如下所示)。注:master中id最大值为1000000。

    319 Query     REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test', 'test', '7', 'PRIMARY', '1000000', NULL, COUNT(*), '0' FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` > '1000000')) ORDER BY `id` /*past upper chunk*/

    CHUNK_INDEX:用于将table切割成chunk的索引。

    LOWER_BOUNDARY:The index values that define the lower boundary of the chunk.

    UPPER_BOUNDARY:The index values that define the upper boundary of the chunk.

    上述两个参数可用来定位具有不同checksum值的chunk对应的索引的上限和下限。

    总结

    1. pt-table-checksum对表进行校验时,并不需要表上面有任何索引。这时候,整张表即是一个chunk。

    REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test2', 't1', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test2`.`t1` /*checksum table*/ 

       如果表的行过多的话,它会报如下错误:

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --chunk-size-limit=100 --no-check-binlog-format --tables=test2.t1

    11-17T11:41:53 Cannot checksum table test2.t1: There is no good index and the table is oversized. at /usr/local/bin/pt-table-checksum
     line 6528.

    解决方法:

    调整chunk-size-limit的值

    # pt-table-checksum -h192.168.244.10 -umonitor -pmonitor123 --chunk-size-limit=1000 --no-check-binlog-format --tables=test2.t1

                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    11-17T11:41:59      0      0   100004       1       0   0.600 test2.t1

      

       

  • 相关阅读:
    Java 时区转换(UTC+8 到 UTC 等等)
    spring 与 springmvc 的区别和定义
    字符串加密解密(Base64)
    上传视频本地预览问题
    vue 监听store中的数值
    判断对象是否为空
    正则 验证是否包含特殊字符
    js 过滤日期格式
    vue methods computed watch区别
    for + setTimeout
  • 原文地址:https://www.cnblogs.com/ivictor/p/6049205.html
Copyright © 2020-2023  润新知