问题描述
某MySQL服务器出现大量慢日志,慢日志中SQL语句部分为COMMIT,执行时间在100秒至400秒。
由于定期对该服务器进行SHOW PROCESSLIST快照, 通过PROCESSLIST日志查看服务器执行情况。
在1:02至1:32期间,业务进行数据归档操作,归档SQL支持超过1800秒:
Id User Host db Command Time State Info
8487369 xxx_rw 172.16.16.26:28148 xxx Query 1807 Sending data INSERT INTO xxx_async_task_20200301 SELECT * FROM xxx_async_task_backTable_17533 WHERE created_at < "20200301"
在1:33,归档SQL状态从Sending data切换为query end:
Id User Host db Command Time State Info
8487369 xxx_rw 172.16.16.26:28148 xxx Query 1807 query end INSERT INTO xxx_async_task_20200301 SELECT * FROM xxx_async_task_backTable_17533 WHERE created_at < "20200301"
在1:33至1:36, 该SQL执行状态仍为:query end,开始出现COMMIT命令等待(starting),活跃连接数从1暴涨至200+:
Id User Host db Command Time State Info
8487369 xxx_rw 172.16.16.26:28148 xxx Query 1807 query end INSERT INTO xxx_async_task_20200301 SELECT * FROM xxx_async_task_backTable_17533 WHERE created_at < "20200301"
8487410 xxx_rw 172.16.16.89:31590 xxx Query 7 starting commit
8487465 xxx_rw 172.16.16.91:58612 xxx Query 10 starting commit
3942982 aurora 11.223.143.134:42889 NULL Query 130 query end /* rds internal mark */ CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT '0',
PRIMARY KEY (type)
)
ENGINE = InnoDB
PS: 阿里云的内部账号进行ha_health_check的CREATE TABLE 命令也被阻塞
在1:36,归档SQL执行完成,处于starting状态的COMMIT命令也随之消失,MySQL生成单个文件超过20G的BINLOG,MySQL慢日志中记录大量包含COMMIT操作的慢日志。
在1:37至1:40,网络流出量暴涨,大量BINLOG数据被推送至从库进行数据同步。
线程执行状态
*Sending data
The thread is reading and processing rows for a SELECT statement, and sending data to the client.
Because operations occurring during this state tend to perform large amounts of disk access (reads), it is often the longest-running state over the lifetime of a given query.
*query end
This state occurs after processing a query but before the freeing items state.
*starting
未在MySQL官网找到解释
问题原因
导致COMMIT操作出现在慢日志原因:
1、在ROW复制模式下,事务修改大量数据,在COMMIT时需要生成BINLOG事件并刷新至BINLOG文件中,由于BINLOG文件只能顺序写入,导致其他事务COMMIT时被阻塞。
2、当存储系统出现IO瓶颈或出现故障时,如当RAID卡充放电过程中,RAID系统从Write Back切换为Write Through后,存储IO使用率暴涨,通用会出现COMMIT操作"长时间执行"情况。