某系统反馈慢SQL影响生产,查看SLOW LOG发现下面慢SQL:
SELECT COUNT(DISTINCT m.batch_no) FROM ob_relation r INNER JOIN ob_batch_d d ON r.sub_order_no = d.outbound_no INNER JOIN ob_batch_m m ON d.batch_no = m.batch_no WHERE r.production_mode =1 AND r.yn=0 AND r.outbound_no ='xxxx2156' AND d.yn=0 AND m.yn=0 AND m.SEND_FLAG=0 AND m.batch_no!='xxx00025984';
设计表上索引情况如下:
表ob_relation上索引: KEY `idx_outbound_no` (`OUTBOUND_NO`), KEY `idx_sub_order_no` (`SUB_ORDER_NO`) 表ob_batch_d上索引: KEY `idx_update_time` (`UPDATE_TIME`), KEY `idx_ob_outbound_batch_d_batch_no` (`BATCH_NO`), KEY `idx_ORDER_GROUP_ID` (`ORDER_GROUP_ID`), 表ob_batch_m上索引: KEY `idx_update_time` (`UPDATE_TIME`), KEY `idx_BATCH_NO` (`BATCH_NO`), KEY `idx_BEAT_NO` (`BEAT_NO`), KEY `idx_BATCH_TYPE_OPT_STATUS` (`BATCH_TYPE`,`OPT_STATUS`), KEY `ix_SEND_FLAG` (`SEND_FLAG`)
查看执行计划为:
+----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+--------------------+-------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+--------------------+-------+----------+-------------+ | 1 | SIMPLE | r | NULL | ref | idx_outbound_no,idx_sub_order_no | idx_outbound_no | 92 | const | 4 | 1.25 | Using where | | 1 | SIMPLE | m | NULL | ref | idx_BATCH_NO,ix_SEND_FLAG | ix_SEND_FLAG | 2 | const | 54292 | 5.07 | Using where | | 1 | SIMPLE | d | NULL | ref | idx_ob_outbound_batch_d_batch_no | idx_ob_outbound_batch_d_batch_no | 92 | ob_task.m.BATCH_NO | 16 | 1.00 | Using where | +----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+--------------------+-------+----------+-------------+
本着直觉判定使用索引ix_SEND_FLAG但影响行数为54292步骤存在问题,虽然过滤条件中包含SEND_FLAG=0且列SEND_FLAG上有索引,但选择性较差,初步断定查询走错索引导致。
尝试1:删除列SEND_FLAG上索引,迫使查询走列BATCH_NO上索引,删除列SEND_FLAG上索引后,执行计划变为:
+----+-------------+-------+------------+-------+----------------------------------+----------------------------------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------------------------+----------------------------------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+ | 1 | SIMPLE | r | NULL | ref | idx_outbound_no,idx_sub_order_no | idx_outbound_no | 92 | const | 4 | 1.25 | Using where | | 1 | SIMPLE | m | NULL | range | idx_BATCH_NO | idx_BATCH_NO | 92 | NULL | 146187 | 9.99 | Using index condition; Using where; Using join buffer (Block Nested Loop) | | 1 | SIMPLE | d | NULL | ref | idx_ob_outbound_batch_d_batch_no | idx_ob_outbound_batch_d_batch_no | 92 | ob_task.m.BATCH_NO | 16 | 1.00 | Using where | +----+-------------+-------+------------+-------+----------------------------------+----------------------------------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+
查询影响行数变得更高,性能更差。
尝试2:检查INNER JOIN 关联列和WHERE条件列上的数据类型,排除隐式转换导致。
尝试3: 去除与ob_batch_m的所有相关进行,仅剩表ob_relation和ob_batch_d做INNER JOIN,其执行计划为:
SELECT COUNT(DISTINCT d.batch_no) FROM ob_relation r INNER JOIN ob_batch_d d ON r.sub_order_no = d.outbound_no WHERE r.production_mode =1 AND r.yn=0 AND r.outbound_no ='xxxx2156' AND d.yn=0 +----+-------------+-------+------------+------+----------------------------------+-----------------+---------+-------+---------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------------------------+-----------------+---------+-------+---------+----------+----------------------------------------------------+ | 1 | SIMPLE | r | NULL | ref | idx_outbound_no,idx_sub_order_no | idx_outbound_no | 92 | const | 4 | 1.25 | Using where | | 1 | SIMPLE | d | NULL | ALL | NULL | NULL | NULL | NULL | 4917114 | 1.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+----------------------------------+-----------------+---------+-------+---------+----------+----------------------------------------------------+
发现表ob_batch_d的关联条件上缺少索引导致。
问题总结:
查询中ob_relation和ob_batch_d关联,而ob_batch_d和ob_batch_m关联,但由于表ob_batch_d关联条件上没有索引且表ob_relation上WHERE过滤条件选择性极高,因此查询优化器将SQL转换为:
SELECT COUNT(DISTINCT m1.batch_no) FROM ( SELECT m.batch_no,d.outbound_no FROM ob_batch_d d INNER JOIN ob_batch_m m ON d.batch_no = m.batch_no WHERE d.yn=0 AND m.yn=0 AND m.SEND_FLAG=0 AND m.batch_no!='xxx00025984' )AS m1 INNER JOIN ( SELECT r.sub_order_no FROM ob_relation r WHERE r.production_mode =1 AND r.yn=0 AND r.outbound_no ='xxxx2156' )AS r1 ON m1.outbound_no=r1.sub_order_no
对于m1子查询内部,表ob_batch_m可以通过SEND_FLAG=0来过滤部分数据(虽然选择性较差),因此选择使用索引ix_SEND_FLAG,而表ob_batch_d上缺少索引的问题被“完美隐藏”。
在优化INNER JOIN操作时,通常需要先确定两表的关系(外表和内表),外表通过WHERE条件列来过滤数据,而内表通过关联条件列来定位数据。
如对于下列查询:
SELECT COUNT(1) FROM ob_relation r INNER JOIN ob_batch_d d ON r.sub_order_no = d.outbound_no WHERE r.outbound_no ='xxxx2156';
表ob_relation和表ob_batch_d做INNER JOIN操作
1、表ob_relation作为外表,可以通过outbound_no ='xxxx2156'来过滤数据,因此需要在ob_relation(outbound_no)上创建索引。
2、表ob_batch_d作为内表,需要通过关联列outbound_no来定位数据,因此需要在ob_batch_d(outbound_no)上创建索引。
3、表ob_relation的关联条件sub_order_no并不需要单独作为索引第一列来创建索引,但在部分查中可以通过覆盖索引特性来优化查询,如上面查询中创建组合索引ob_relation(outbound_no,sub_order_no)能获得最佳性能。
总结:
有时候,经验能帮助你快速处理问题,但有些时候,经验能带你直接走入误区。