通常是根据 Seconds_Behind_Master 的值来判断slave的延迟。这么做大部分情况下尚可接受,但并不够准确,而应该考虑更多因素。
查看slave的状态
mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.20 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000150 Read_Master_Log_Pos: 226326418 Relay_Log_File: abce-relay-bin.000003 Relay_Log_Pos: 77106856 Relay_Master_Log_File: mysql-bin.000149 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1046292913 Relay_Log_Space: 330964012 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 30510 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 1 row in set (0.00 sec) mysql>
从查询结果可以看出,slave落后master的时间是30510秒(即Seconds_Behind_Master的值)
查看一下slave上的复制进程在做什么:
mysql> show full processlistG *************************** 1. row *************************** Id: 4 User: system user Host: db: NULL Command: Connect Time: 6018 State: Waiting for master to send event Info: NULL *************************** 2. row *************************** Id: 5 User: system user Host: db: abce Command: Connect Time: 30501 State: updating Info: delete from s_item_e where ( ng_user_id = 23880 and dt_shift_date in ( '2017-10-30 00:00:00' , '2017-10-29 00:00:00' ) ) or ( ng_user_id = 24853 and dt_shift_date in ( '2017-10-30 00:00:00' , '2017-10-29 00:00:00' ) ) ......
sql线程在做updating操作,具体内容是从表中删除数据。Time的值是30501。通常简单的sql不会执行30501秒。
在REPLICATION进程中,Time这列的值还可能有几种情况:
1、SQL线程当前执行的binlog(实际上是relay log)中的timestamp和IO线程最新的timestamp的差值,这就是通常大家认为的Seconds_Behind_Master值,并不是某个SQL的实际执行耗时;
2、SQL线程当前如果没有活跃SQL在执行的话,Time值就是SQL线程的idle time;
而IO线程的Time值则是该线程自从启动以来的总时长(多少秒),如果系统时间在IO线程启动后发生修改的话,可能会导致该Time值异常,比如变成负数,或者非常大。
可以根据以下顺序进行延迟的监控:
1.首先比较 Read_Master_Log_Pos 和 Master_Log_File 是否有差异 2.如果 Relay_Master_Log_File 和 Master_Log_File 是一样的话,再看 Exec_Master_Log_Pos 和 Read_Master_Log_Pos 的差异,对比SQL线程比IO线程慢了多少个binlog事件; 3.如果 Relay_Master_Log_File 和 Master_Log_File 不一样,那说明延迟可能较大,需要从MASTER上取得binlog status,判断当前的binlog和MASTER上的差距;
查看主库的binary日志:
mysql> SHOW BINARY LOGS; +------------------+------------+ | Log_name | File_size | +------------------+------------+ | mysql-bin.000142 | 1073778812 | | mysql-bin.000143 | 1073798740 | | mysql-bin.000144 | 1073910052 | | mysql-bin.000145 | 1073781405 | | mysql-bin.000146 | 1073845859 | | mysql-bin.000147 | 1073746199 | | mysql-bin.000148 | 1073863295 | | mysql-bin.000149 | 1073822822 | | mysql-bin.000150 | 226343255 | +------------------+------------+ 9 rows in set (0.00 sec) mysql>
计算延迟应该是mysql-bin.000149的最大事件数减去已经被执行完的事件数,
即 1073822822 - 1046292913 = 27529909 个binlog event,再加上mysql-bin.000150这个binlog已经产生的226343255个binlog event,共226343255个binlog event。即:1073822822 - 1046292913 + 226343255
查看过程中也可以设置pager,只查看关注的几个status值:
mysql> pager cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos' PAGER set to 'cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos'' mysql> 这时有活跃SQL,Time值是30583,Seconds_Behind_Master是30565 mysql> show processlist; show slave statusG | 4 | system user | | NULL | Connect | 7168 | Waiting for master to send event | NULL | | 5 | system user | | abce | Connect | 30583 | updating | delete from s_item_e 3 rows in set (0.00 sec) Read_Master_Log_Pos: 226364447 Exec_Master_Log_Pos: 1057661705 Seconds_Behind_Master: 30565 1 row in set (0.00 sec) mysql>