• mysql锁排查


    一、说明

    本文将通过实验介绍mysql锁该如何排查,本实验的mysql版本为8.0.23,隔离级别为RC。

    二、实验

    场景一:会话a开启事务,删除全部数据。会话b开始事务,删除部分数据

    会话A:

    (root@localhost)[hello]> begin;
    (root@localhost)[hello]> delete from x1;
    (root@localhost)[hello]> delete from x2;
    (root@localhost)[hello]> select * from x1;
    (root@localhost)[hello]> select * from x2;
    

    会话B:

    (root@localhost)[hello]> begin;
    (root@localhost)[hello]> delete from x1 limit 1;  # 此处发生等待
    

    再起一个窗口,查看会话列表
    ID 34即会话A已执行完成,状态为Sleep。ID 37即会话B正在执行,状态为Query。

    (root@localhost)[(none)]> select * from information_schema.processlist;
    +----+------+-----------+-------+---------+------+-----------+----------------------------------------------+
    | ID | USER | HOST      | DB    | COMMAND | TIME | STATE     | INFO                                         |
    +----+------+-----------+-------+---------+------+-----------+----------------------------------------------+
    | 34 | root | localhost | hello | Sleep   |   49 |           | NULL                                         |
    | 36 | root | localhost | NULL  | Query   |    0 | executing | select * from information_schema.processlist |
    | 37 | root | localhost | hello | Query   |   12 | updating  | delete from x1 limit 1                       |
    +----+------+-----------+-------+---------+------+-----------+----------------------------------------------+
    

    查看会话阻塞情况,waiting_pid、waiting_trx_id、blocking_pid、blocking_trx_id,分别表示被阻塞的事务和导致阻塞的事务。
    结果显示事务23641阻塞了事务23654。

    (root@localhost)[(none)]> select locked_table_schema, locked_table_name, waiting_query, waiting_pid, waiting_trx_id, blocking_pid, blocking_trx_id from sys.innodb_lock_waits;
    +---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+
    | locked_table_schema | locked_table_name | waiting_query          | waiting_pid | waiting_trx_id | blocking_pid | blocking_trx_id |
    +---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+
    | hello               | x1                | delete from x1 limit 1 |          37 |          23654 |           34 |           23641 |
    +---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+
    

    查看锁的信息,engine_transaction_id对应事务编号,thread_id对应线程编号。
    前面我们查到事务编号23641阻塞了事务编号23654,这里显示事务编号23641对应线程85,事务编号23654对应线程88。

    (root@localhost)[(none)]> select distinct engine_transaction_id, thread_id, object_schema, object_name, lock_type, lock_mode, lock_status from performance_schema.data_locks;
    +-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+
    | engine_transaction_id | thread_id | object_schema | object_name | lock_type | lock_mode     | lock_status |
    +-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+
    |                 23654 |        88 | hello         | x1          | TABLE     | IX            | GRANTED     |
    |                 23654 |        88 | hello         | x1          | RECORD    | X,REC_NOT_GAP | WAITING     |
    |                 23641 |        85 | hello         | x1          | TABLE     | IX            | GRANTED     |
    |                 23641 |        85 | hello         | x1          | RECORD    | X,REC_NOT_GAP | GRANTED     |
    |                 23641 |        85 | hello         | x2          | TABLE     | IX            | GRANTED     |
    |                 23641 |        85 | hello         | x2          | RECORD    | X,REC_NOT_GAP | GRANTED     |
    +-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+
    

    查看造成阻塞的线程历史执行的语句,其中thread_id对应performance_schema.data_locks中的thread_id
    结果显示线程85执行过“delete from x1”,这才是线程88执行“delete from x1 limit 1”等待的原因。

    (root@localhost)[(none)]> select timer_start, thread_id, sql_text, message_text from performance_schema.events_statements_history where event_name like 'statement/sql%' and thread_id in (85) order by timer_start desc;
    +-------------------+-----------+----------------------------------+--------------+
    | timer_start       | thread_id | sql_text                         | message_text |
    +-------------------+-----------+----------------------------------+--------------+
    | 22461001917916000 |        85 | select * from x2                 | NULL         |
    | 22460129443671000 |        85 | select * from x1                 | NULL         |
    | 22460050982468000 |        85 | delete from x2                   | NULL         |
    | 22460020034606000 |        85 | delete from x1                   | NULL         |
    | 22459990044910000 |        85 | begin                            | NULL         |
    | 22416295653612000 |        85 | select USER()                    | NULL         |
    | 22416294129231000 |        85 | select @@version_comment limit 1 | NULL         |
    +-------------------+-----------+----------------------------------+--------------+
    

    将上面所有的语句进行合并,就可以得到锁的完整信息了

    (root@localhost)[(none)]> select distinct a.wait_started, a.locked_table, a.waiting_query, a.waiting_pid, a.blocking_pid, c.sql_text 
    from sys.innodb_lock_waits a 
    join performance_schema.data_locks b on (a.blocking_trx_id = b.engine_transaction_id and a.locked_table_schema = b.object_schema and a.locked_table_name = b.object_name)
    join performance_schema.events_statements_history c on (b.thread_id = c.thread_id and c.sql_text regexp b.object_name);
    +---------------------+--------------+------------------------+-------------+--------------+------------------+
    | wait_started        | locked_table | waiting_query          | waiting_pid | blocking_pid | sql_text         |
    +---------------------+--------------+------------------------+-------------+--------------+------------------+
    | 2022-02-16 16:26:25 | `hello`.`x1` | delete from x1 limit 1 |          37 |           34 | delete from x1   |
    | 2022-02-16 16:26:25 | `hello`.`x1` | delete from x1 limit 1 |          37 |           34 | select * from x1 |
    +---------------------+--------------+------------------------+-------------+--------------+------------------+
    

    场景二:会话a对表加元数据锁。会话b开始事务,删除部分数据

    会话A:

    (root@localhost)[hello]> lock table x1 write;
    

    会话B:

    (root@localhost)[hello]> delete from x1 limit 1;  # 此处发生等待
    

    查看会话列表
    ID 39即会话A已执行完成,状态为Sleep。ID 40即会话B正在执行,状态为Query。

    (root@localhost)[(none)]> select * from information_schema.processlist;
    +----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+
    | ID | USER | HOST      | DB    | COMMAND | TIME | STATE                           | INFO                                         |
    +----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+
    | 49 | root | localhost | hello | Sleep   |   81 |                                 | NULL                                         |
    | 51 | root | localhost | NULL  | Query   |    0 | executing                       | select * from information_schema.processlist |
    | 52 | root | localhost | hello | Query   |    9 | Waiting for table metadata lock | delete from x1 limit 1                       |
    +----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+
    

    查看innodb_lock_waits和data_locks,都没有数据。

    (root@localhost)[(none)]> select * from sys.innodb_lock_waits;
    Empty set (0.00 sec)
    
    (root@localhost)[(none)]> select * from performance_schema.data_locks;
    Empty set (0.00 sec)
    

    查看当前数据库执行过的sql,能找到锁表的语句,注意这里的thread_id并不对应processlist表中的id。

    (root@localhost)[(none)]> select thread_id, event_name, sql_text from performance_schema.events_statements_current;
    +-----------+---------------------------+------------------------------------------------------------------------------------------+
    | thread_id | event_name                | sql_text                                                                                 |
    +-----------+---------------------------+------------------------------------------------------------------------------------------+
    |       100 | statement/sql/lock_tables | lock table x1 write                                                                      |
    |       102 | statement/sql/select      | select thread_id, event_name, sql_text from performance_schema.events_statements_current |
    |       103 | statement/sql/delete      | delete from x1 limit 1                                                                   |
    +-----------+---------------------------+------------------------------------------------------------------------------------------+
    

    查看threads表,这里的thread_id对应events_statements_current表中thread_id,processlist_id对应processlist表中id。

    (root@localhost)[(none)]> select thread_id, processlist_id, processlist_user, processlist_db, processlist_command, processlist_time, processlist_state, processlist_info from performance_schema.threads where type = 'FOREGROUND';
    +-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+
    | thread_id | processlist_id | processlist_user | processlist_db | processlist_command | processlist_time | processlist_state               | processlist_info       |
    +-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+
    |        48 |              6 | NULL             | NULL           | Daemon              |            31219 | Suspending                      | NULL                   |
    |       100 |             49 | root             | hello          | Sleep               |              172 | NULL                            | NULL                   |
    |       103 |             52 | root             | hello          | Query               |              100 | Waiting for table metadata lock | delete from x1 limit 1 |
    +-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+
    

    将以上sql进行合并,从sql_text中找锁表的语句,杀掉对应的会话即可。

    (root@localhost)[(none)]> select b.processlist_id, b.processlist_user, b.processlist_db, b.processlist_command, b.processlist_time, b.processlist_state, b.processlist_info, a.event_name, a.sql_text
    from performance_schema.events_statements_current a
    join performance_schema.threads b on (a.thread_id = b.thread_id);
    +----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+
    | processlist_id | processlist_user | processlist_db | processlist_command | processlist_time | processlist_state               | processlist_info        | event_name                | sql_text               |
    +----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+
    |             49 | root             | hello          | Sleep               |              289 | NULL                            | NULL                    | statement/sql/lock_tables | lock table x1 write    |
    |             52 | root             | hello          | Query               |              217 | Waiting for table metadata lock | delete from x1 limit 1  | statement/sql/delete      | delete from x1 limit 1 |
    +----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+
    

    三、监控

    查看事务信息

    (root@localhost)[(none)]> select trx_id, trx_state, trx_started, trx_mysql_thread_id, trx_query from information_schema.innodb_trx;
    +--------+-----------+---------------------+---------------------+------------------------+
    | trx_id | trx_state | trx_started         | trx_mysql_thread_id | trx_query              |
    +--------+-----------+---------------------+---------------------+------------------------+
    |  23654 | LOCK WAIT | 2022-02-16 16:26:25 |                  37 | delete from x1 limit 1 |
    |  23641 | RUNNING   | 2022-02-16 16:25:47 |                  34 | NULL                   |
    +--------+-----------+---------------------+---------------------+------------------------+
    

    因此可以通过下面的语句查出当前最长的事务。

    (root@localhost)[(none)]> select max(time_to_sec(timediff(now(), trx_started))) max_trx_time from information_schema.innodb_trx;  
    +--------------+
    | max_trx_time |
    +--------------+
    |          571 |
    +--------------+
    

    四、总结

    1. 等待行锁最大的等待时长参数为innodb_lock_wait_timeout,默认50s。等待元数据锁最大的等待时长参数为lock_wait_timeout,默认31536000s(一年)。建议对lock_wait_timeout进行调整。
    2. 通过information_schema.innodb_trx对数据库进行长事务监控。
    3. 通过sys.innodb_lock_waits查看被阻塞的sql
    4. 元数据锁通过performance_schema.events_statements_current,performance_schema.threads关联起来看
  • 相关阅读:
    Python——数据结构——字典
    Python——print()函数
    Python数据结构——序列总结
    elasticsearch全文检索java
    elasticsearch单例模式连接 java
    【转载】信号中断 与 慢系统调用
    设计模式——状态模式(C++实现)
    设计模式——观察者模式(C++实现)
    C++调用C方法
    设计模式——外观模式(C++实现)
  • 原文地址:https://www.cnblogs.com/ddzj01/p/15901984.html
Copyright © 2020-2023  润新知