Preface
Replication delay is a common issue in MySQL replications.Especially in those replications not using semi-synchronous method.Long term delay leads to inconsistency between master and slaves.Therefore,we should regard the replication delay as an important monitor item.To a certain extent,the value of "Seconds_Behind_Master" shows the situation of replication delay,but it's always inaccurate.Even if the value is "0" sometimes,the replication delay remains.
Introduce
pt-heartbeat is the very tool can be used to monitor replication delay of MySQL or PostgreSQL.It provides a way to update a master or to monitor a replica.If you do not speicify the connection options,it will retrieve them in your default my.cnf file.
Procedure
Usage
1 pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop
Common parameters
1 --check //Specify the check mode to slave delay.It only checks once then exits. 2 --monitor //Specify the monitor mode to slave delay. 3 --update //Specify the update mode on master. 4 --check-read-only //It skips insert operations if the server is read-only status. 5 --create-table //Create a heartbeat table. 6 --table //Specify the heartbeat table(default "heartbeat"). 7 --daemonize //Run it as background process. 8 --dbi-driver //Specify the drive of database(default "mysql",other value is "pg"). 9 --file //Specify a output file when using --monitor. 10 --log //Print all output to the file when using --daemonize 11 --master-server-id //Calculate delay from this master server id. 12 --interval //Specify the frequency to update or check the heartbeat table(default "1.0"). 13 --run-time //Specify the period of time to run(default suffix is "s",if you set 60,it means 60s). 14 15 Notice: 16 Specify at least one of --stop, --update, --monitor, or --check. 17 --update, --monitor, and --check are mutually exclusive. 18 --daemonize and --check are mutually exclusive.
Examples
Execute pt-heartbeat to update the heartbeat of master.
1 [root@zlm2 03:21:21 ~] 2 #pt-heartbeat --database=sysbench --create-table --table=hb --daemonize -h192.168.1.101 -P3306 -uzlm -pzlmzlm --update 3 4 [root@zlm2 03:21:34 ~] 5 # 6 7 (zlm@192.168.1.101 3306)[sysbench]>show tables like 'hb'; 8 +-------------------------+ 9 | Tables_in_sysbench (hb) | 10 +-------------------------+ 11 | hb | 12 +-------------------------+ 13 1 row in set (0.00 sec) 14 15 (zlm@192.168.1.101 3306)[sysbench]>select * from hb; 16 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 17 | ts | server_id | file | position | relay_master_log_file | exec_master_log_pos | 18 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 19 | 2018-07-19T03:34:07.005490 | 1013306 | mysql-bin.000015 | 975231 | NULL | NULL | 20 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 21 1 row in set (0.00 sec) 22 23 //The operation above will create a heartbeat table "hb" in database "sysbench" if the target table not exist. 24 //It will update the table which contains the heartbeat info of master every time you execute with "--update".
Execute pt-heartbeat to check delay of slave.
1 [root@zlm2 03:40:19 ~] 2 #pt-heartbeat --database=sysbench -h192.168.1.102 -P3306 -uzlm -pzlmzlm --check 3 DBD::mysql::db selectrow_arrayref failed: Table 'sysbench.heartbeat' doesn't exist [for Statement "SHOW CREATE TABLE `sysbench`.`heartbeat`"] at /usr/bin/pt-heartbeat line 5873. 4 5 [root@zlm2 03:40:56 ~] 6 #pt-heartbeat --database=sysbench --create-table --table=hb -h192.168.1.102 -P3306 -uzlm -pzlmzlm --check 7 0.00 8 9 (zlm@192.168.1.102 3306)[sysbench]>show tables like 'hb'; 10 +-------------------------+ 11 | Tables_in_sysbench (hb) | 12 +-------------------------+ 13 | hb | 14 +-------------------------+ 15 1 row in set (0.00 sec) 16 17 (zlm@192.168.1.102 3306)[sysbench]>select * from hb; 18 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 19 | ts | server_id | file | position | relay_master_log_file | exec_master_log_pos | 20 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 21 | 2018-07-19T03:43:53.005890 | 1013306 | mysql-bin.000015 | 2258210 | NULL | NULL | 22 | 2018-07-19 03:42:02 | 1023306 | NULL | NULL | NULL | NULL | 23 +----------------------------+-----------+------------------+----------+-----------------------+---------------------+ 24 2 rows in set (0.00 sec) 25 26 //The connection option refers to slave. 27 //It will also create the same heartbeat table on the slave. 28 //The output will show the delay time.
Execute pt-heartbeat to monitor delay of slave.
1 [root@zlm2 03:52:42 ~] 2 #pt-heartbeat --database=sysbench --create-table --table=hb --daemonize --log=/root/monitor.log -h192.168.1.102 -P3306 -uzlm -pzlmzlm --monitor 3 4 [root@zlm2 03:54:02 ~] 5 #ls -l|grep monitor 6 -rw-r--r-- 1 root root 2448 Jul 19 03:54 monitor.log 7 8 [root@zlm2 03:54:24 ~] 9 #cat monitor.log 10 ******************************************************************* 11 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client 12 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER 13 possibly with SSL_ca_file|SSL_ca_path for verification. 14 If you really don't want to verify the certificate and keep the 15 connection open to Man-In-The-Middle attacks please set 16 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application. 17 ******************************************************************* 18 at /usr/bin/pt-heartbeat line 4438. 19 ******************************************************************* 20 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client 21 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER 22 possibly with SSL_ca_file|SSL_ca_path for verification. 23 If you really don't want to verify the certificate and keep the 24 connection open to Man-In-The-Middle attacks please set 25 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application. 26 ******************************************************************* 27 at /usr/bin/pt-heartbeat line 4438. 28 29 # A software update is available: 30 # * The current version for Percona::Toolkit is 3.0.5 31 32 0.00s [ 0.00s, 0.00s, 0.00s ] 33 0.00s [ 0.00s, 0.00s, 0.00s ] 34 0.00s [ 0.00s, 0.00s, 0.00s ] 35 0.00s [ 0.00s, 0.00s, 0.00s ] 36 0.00s [ 0.00s, 0.00s, 0.00s ] 37 0.00s [ 0.00s, 0.00s, 0.00s ] 38 0.00s [ 0.00s, 0.00s, 0.00s ] 39 0.00s [ 0.00s, 0.00s, 0.00s ] 40 0.00s [ 0.00s, 0.00s, 0.00s ] 41 42 [root@zlm2 03:54:32 ~] 43 # 44 45 //The output will be write into the logfile sepcified by option "--log". 46 //The logfile will be continuously written unless you've specified th option "--run-time". 47 //It seems no delay between master and slave at all.We can use sysbench generate some transactions to observe the difference.
Execute sysbench and watch again.
1 [root@zlm2 04:16:24 ~/sysbench-1.0/src/lua] 2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=1 --table-size=10000000 --mysql-storage-engine=innodb prepare 3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 4 5 Creating table 'sbtest1'... 6 Inserting 10000000 records into 'sbtest1' 7 FATAL: mysql_drv_query() returned error 1114 (The table 'sbtest1' is full) for query 'INSERT INTO sbtest1(k, c, pad) VALUES(5007652, '86766538515-66952496290-52637383744-17777505783-29917382722-84335317264-36504610739-38121689798-84219011968-91366385874', '49260963569-36115667021-10552248239-18067079911-61369170530'),(4980838, '54654185213-72432916652-68658723905-01905687478-16685611449-05853232015-51457796697-51819403372-03987006557-33172822697', '21128589837-49224626065-81618031353-10869323172-08902635377'),(4369690, '19346998374-76157809978-61295763131-66817961402-14155793123-89485702729-52643919933-37547938307-65477074083-42947658759', '48407396301-60012651284-13188093294-65156227928-43584415056'),(4980019, '70453863386-09471491416-32761370166-00808275356-79347375847-13151837625-86471458250-39802434455-19087793196-21946389164', '33264881266-44980565111-26757160090-86077734704-00045098929'),(4997783, '58799482920-02512874911-59302214059-00300283635-35103600246-26397800977-48778245550-17735174970-94350595573-16543920606', '23090791422-15031674988-38742619953-96110910554-14814557148'),(4996542, '02526238751-03201222067-54203245066-96888627735-91568973200-70159837175-05408478565-91790491503-65208127071-01416690185', '49867218536-45143413529-09551369789-59447916184-43851467884'),(5016589, '74498181715-26316533762-09669873675-78321331809-32282301673-46976991777-42037713521-13856469837-75059942943-62359417982', '17496574432-77585238967-84065009097-69416190626-30209633909'),(4994933, '71859258017-48270005429-16959639772-39718531870-78995154112-71510567312-49374130735-00512159149-93514688119-69243734179', '23940826906-68127513411-04170272492-93173723876-07024484156'),(5024203, '01612812086-88501262208-29821147265-27833323182-26465297118-67798980074-90884201988-32250326631-52571482980-27120623387', '32676071422-40807168337-23971263649-30743049902-70369274393'),(4162450, '82998769803-54845523696-25503288553-48015207428-64262352808-97661531385-74384257021-19880800516-86393457605-98507752399', '02796854361-18464776198-06166096636-53988338130-49624903182'),(4995444, '66477706887-39253236907-03545461775-18565950790-70596896493-96343527077-27384490842-67789533300-13229220655-04716282051', '93177901449-78610249192-43379190612-81683429736-42624094282'),(5014023, '00026114498-69425370048-46519646170-15138611515-26550011239-94540841776-52595856582-34808319037-14376366590-67052294983', '56614501773-95380362876-81940933213-48069389750-96896566709'),(5024261, '49006870590-81308017607-77748035537-80561018761-10903078210-73474291600-34790301926-33257007983-62587725038-62428934340', '19026821007-38039245675-09902375187-72268734965-98944641058'),(5024236, '10110443501-28024080051-93513912505-09437560828-34230239659-09523768623-23688243983-59021499845-46898378159-07771020838', '24997608907-31970368651-18989687877-75418790951-46453076386'),(5049980, '19429198356-67366238484-55380613354-35977328479-83088969020-14656225800-25216048033-45250663251-98396746788-21125067075', '71090143904-45428036794-81542588223-10154666659-75546296798'),(5471904, '75703864024-12238744221-17802200544-33445694372-23795426538-35115713085-36013837723-42595287843-50130126814-13689292194', '33729585352-83996219254-04756782645-33242342933-26629849933'),(5019788, '26736440482-52751355069-31401045555-46875459054-74471530241-80094262346-04701622365-49412060729-73926195317-10568100029', '94218352951-20508982927-74070061682-74623136939-21804519403'),(5021303, '68369236188-31813582854-91111181432-49899358000-41024394879-31154794562-68384652868-38150046317-87574937898-60778176522', '57835818598-56026673181-41314509499-62765615632-08938407644'),(5015123, '67124253129-45231369455-80389246483-30556743024-23531555287-47922585120-65610495252-46796320051-45675319664-05751522029', '17364348778-61672427639-08537678844-68212837883-00352572907'),(3676715, '05788144973-64579114586-22354309091-20803641999-44828403219-79251514115-97961584283-16337702597-51136491029-67923490682', '13053760861-60814974240-44320882636-73693584211-05836315795'),(5236763, '68612629880-77489830256-75974497057-76984695030-FATAL: `sysbench.cmdline.call_command' function failed: ./oltp_common.lua:230: db_bulk_insert_next() failed 8 9 //Master 10 [root@zlm2 04:33:54 ~/sysbench-1.0/src/lua] 11 #df -h 12 Filesystem Size Used Avail Use% Mounted on 13 /dev/mapper/centos-root 8.4G 8.4G 68M 100% / //The disk space of master is full. 14 devtmpfs 488M 0 488M 0% /dev 15 tmpfs 497M 0 497M 0% /dev/shm 16 tmpfs 497M 6.6M 491M 2% /run 17 tmpfs 497M 0 497M 0% /sys/fs/cgroup 18 /dev/sda1 497M 118M 379M 24% /boot 19 none 87G 80G 6.6G 93% /vagrant 20 21 //Slave 22 [root@zlm3 04:34:17 ~] 23 #df -h 24 Filesystem Size Used Avail Use% Mounted on 25 /dev/mapper/centos-root 8.4G 8.4G 20K 100% / //The disk space of slave is full,too. 26 devtmpfs 488M 0 488M 0% /dev 27 tmpfs 497M 0 497M 0% /dev/shm 28 tmpfs 497M 6.5M 491M 2% /run 29 tmpfs 497M 0 497M 0% /sys/fs/cgroup 30 /dev/sda1 497M 118M 379M 24% /boot 31 none 87G 80G 6.6G 93% /vagrant 32 33 //Check the output logfile. 34 [root@zlm2 04:18:21 ~] 35 #tail monitor.log 36 1.00s [ 6.40s, 1.65s, 0.55s ] 37 0.00s [ 6.26s, 1.65s, 0.55s ] 38 0.99s [ 6.13s, 1.65s, 0.55s ] 39 1.99s [ 6.00s, 1.66s, 0.55s ] 40 2.99s [ 5.86s, 1.67s, 0.56s ] 41 3.99s [ 5.73s, 1.68s, 0.56s ] 42 4.99s [ 5.59s, 1.70s, 0.57s ] 43 6.00s [ 5.46s, 1.72s, 0.57s ] 44 4.01s [ 5.29s, 1.73s, 0.58s ] 45 4.99s [ 5.16s, 1.75s, 0.58s ] 46 47 //Long time later. 48 49 [root@zlm2 04:59:24 ~] 50 #tail monitor.log 51 1769.00s [ 1739.50s, 1619.50s, 1319.50s ] 52 1769.99s [ 1740.50s, 1620.50s, 1320.50s ] 53 1770.99s [ 1741.50s, 1621.50s, 1321.50s ] 54 1771.99s [ 1742.50s, 1622.50s, 1322.50s ] 55 1772.99s [ 1743.50s, 1623.50s, 1323.50s ] 56 1773.99s [ 1744.50s, 1624.50s, 1324.50s ] 57 1774.99s [ 1745.50s, 1625.50s, 1325.50s ] 58 1775.99s [ 1746.50s, 1626.50s, 1326.50s ] 59 1776.99s [ 1747.50s, 1627.50s, 1327.50s ] 60 1778.00s [ 1748.50s, 1628.50s, 1328.50s ] 61 62 [root@zlm2 05:02:54 ~] 63 #tail monitor.log 64 2122.00s [ 2092.50s, 1972.50s, 1672.50s ] 65 2123.00s [ 2093.50s, 1973.50s, 1673.50s ] 66 2124.00s [ 2094.50s, 1974.50s, 1674.50s ] 67 2125.00s [ 2095.50s, 1975.50s, 1675.50s ] 68 2126.00s [ 2096.50s, 1976.50s, 1676.50s ] 69 2127.00s [ 2097.50s, 1977.50s, 1677.50s ] 70 2127.99s [ 2098.50s, 1978.50s, 1678.50s ] 71 2129.00s [ 2099.50s, 1979.50s, 1679.50s ] 72 2129.99s [ 2100.50s, 1980.50s, 1680.50s ] 73 2131.00s [ 2101.50s, 1981.50s, 1681.50s ] 74 75 (zlm@192.168.1.102 3306)[sysbench]>show slave statusG 76 ERROR 2013 (HY000): Lost connection to MySQL server during query 77 (zlm@192.168.1.102 3306)[sysbench]>show slave statusG 78 ERROR 2006 (HY000): MySQL server has gone away 79 No connection. Trying to reconnect... 80 Connection id: 10 81 Current database: sysbench 82 83 //The slave has hung because of the lack of disk space. 84 //The delay continuously increasing what can be seen in the logfile on master. 85 86 #180719 4:31:55 server id 1013306 end_log_pos 2044211 Table_map: `sysbench`.`hb` mapped to number 108 87 # at 2044211 88 #180719 4:31:55 server id 1013306 end_log_pos 2044359 Update_rows: table id 108 flags: STMT_END_F 89 ### UPDATE `sysbench`.`hb` 90 ### WHERE 91 ### @1='2018-07-19T04:31:55.004000' /* VARSTRING(78) meta=78 nullable=0 is_null=0 */ 92 ### @2=1013306 /* INT meta=0 nullable=0 is_null=0 */ 93 ### @3='mysql-bin.000019' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */ 94 ### @4=2043294 /* LONGINT meta=0 nullable=1 is_null=0 */ 95 ### @5=NULL /* VARSTRING(765) meta=765 nullable=1 is_null=1 */ 96 ### @6=NULL /* LONGINT meta=0 nullable=1 is_null=1 */ 97 ### SET 98 ### @1='2018-07-19T04:31:55.004400' /* VARSTRING(78) meta=78 nullable=0 is_null=0 */ 99 ### @2=1013306 /* INT meta=0 nullable=0 is_null=0 */ 100 ### @3='mysql-bin.000019' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */ 101 ### @4=2043294 /* LONGINT meta=0 nullable=1 is_null=0 */ 102 ### @5=NULL /* VARSTRING(765) meta=765 nullable=1 is_null=1 */ 103 ### @6=NULL /* LONGINT meta=0 nullable=1 is_null=1 */ 104 # at 2044359 105 #180719 4:31:55 server id 1013306 end_log_pos 2044386 Xid = 94022 106 COMMIT/*!*/; 107 108 //The output above is the detail of updating heartbeat table "hb" on master.
Summary
- pt-heartbeat is rather useful in checking replica lag of MySQL and it also support PostgreSQL.
- pt-heartbeat will read connection options in my.cnf by default if you do not specify them.
- If you've executed the pt-heartbeat with option "--daemonize",don't forget to kill the redundant processes.
- We can retrieve these lag information in heartbeat table using our monitor tool or script.