说明:
最近优化慢SQL,执行计划错误和OR条件查询优化实战经验,提供优化SQL思路和方法:
1,利用exists来优化SQL(利用exists减少回表查询次数和确定驱动表)
2,OR语句优化(OR条件,字段有索引,无法使用索引的)
案例1:
SELECT sum( CASE WHEN ols.check_status NOT IN ( 2 , 3 , 4 ) THEN 1 WHEN ols.check_status IS NULL THEN 1 END ) AS lesson_num, sum( ols.check_status = 1 ) AS attend_num FROM ol_live_student_time_chapter_list olstcl LEFT JOIN ol_live_student ols ON olstcl.live_student_id = ols.id inner JOIN ol_user u ON u.id = olstcl.user_id WHERE olstcl.class_course_type = '2' AND olstcl.attend_status <> 4 AND olstcl.start_time >= 1621526400 AND olstcl.start_time < 1621612799 AND u.is_test_user = '0' AND counselor_id = '796' AND u.pay_status = '0' AND u.type = '1' AND u.STATUS = '1' ; |
执行超过2.2秒,执行计划如下,从执行计划看:返回的rows也很少。key里用到了索引,按理说这个SQL是最优的SQL,没有优化的空间
但实际查看ol_user表,表有430万条数据,核心还是查询ol_user表的数据,如果能去掉ol_user表查询,就更好,如果不能去掉,能有其他优化方法:
通过仔细分析3个表,各个条件查询查出的数据:
select count(*) ol_live_student_time_chapter_list olstcl where olstcl.start_time>= 1621526400 AND olstcl.start_time< 1621612799 and olstcl.class_course_type = '2' AND olstcl.attend_status <> 4 |
发现ol_live_student_time_chapter_list 查询,这个时间查出只有8000多条。而查询oL_user表查询:
select count(*) from ol_user u where u.is_test_user = '0' AND counselor_id = '796' AND u.pay_status = '0' AND u.type = '1' AND u.STATUS = '1' |
查出300多条,但仔细计划显示:Using intersect(pay_status_user,counselor_id,type,is_test_user); Using where; Using index。 的确用很多索引,
这样看,表面看执行计划没问题,但实际看一下,这个ol_user表查询,其实只是一个条件,ol_user没有字段在select查出显示,从某种意义上讲,只需要查出符合条件的就可以,如果满足其中
一个就返回效率是否更高,不需要每个条件都查完,这样我们完全可以用EXISTS 替换inner join 来提高查询效率,SQL修改如下:
SELECT sum( CASE WHEN ols.check_status NOT IN ( 2 , 3 , 4 ) THEN 1 WHEN ols.check_status IS NULL THEN 1 END ) AS lesson_num, sum( ols.check_status = 1 ) AS attend_num FROM ol_live_student_time_chapter_list olstcl LEFT JOIN ol_live_student ols ON olstcl.live_student_id = ols.id WHERE olstcl.class_course_type = '2' AND olstcl.attend_status <> 4 AND olstcl.start_time >= 1621526400 AND olstcl.start_time < 1621612799 and EXISTS (select 1 from ol_user u where u.is_test_user = '0' AND u.counselor_id = '796' AND u.pay_status = '0' A ND u.type = '1' AND u.STATUS = '1' and u.id = olstcl.user_id) ; |
使用优化的SQL,时间只有0.9秒左右
优化原理: 利用 EXISTS 来替换 inner join,减少查询循环回表次数,提高效率。
案例2:
select count(distinct u.id) from ol_user u left join ol_user_related_info uri on u.id = uri.user_id where u.type= 1 and u.status = 1 and u.pay_status= 0 and uri.recovery_time_no_pay>= 1622649600 and uri.recovery_time_no_pay< 1622736000 and u.counselor_id in ( 403 ); |
执行超过1.3秒,执行计划如下:
从这里看:查出ol_user的u.id,distinct汇总,而ol_user_related_info 是left join,仔细看逻辑,虽然是left join,但有uri.recovery_time_no_pay条件,就这个left join 可以改成inner join
可以改成:
select count(distinct uri.user_id) from ol_user u inner join ol_user_related_info uri on u.id = uri.user_id where u.type= 1 and u.status = 1 and u.pay_status= 0 and uri.recovery_time_no_pay>= 1622649600 and uri.recovery_time_no_pay< 1622736000 and u.counselor_id in ( 403 ); |
修改完,2者的执行效率,查不多,改了也没优化,如上面的优化,我们知道ol_user表的数据量太大,要减少回表的查询,这样的sql就可以改成exists查询,如下:
select count(distinct uri.user_id) from ol_user_related_info uri where uri.recovery_time_no_pay>= 1622649600 and uri.recovery_time_no_pay< 1622736000 and exists (select 1 from ol_user u where u.id = uri.user_id and u.type= 1 and u.status = 1 and u.pay_status= 0 and u.counselor_id in ( 403 ) ) |
改成这样,SQL执行只需0.6秒左右,未加索引,2次修改SQL后,就优化了SQL
案例3
SELECT f.flow_type,count( 1 ) as num FROM `ol_admin_flow` `f` LEFT JOIN `ol_admin_flow_node` `n` ON `f`.`flowid`=`n`.`flowid` WHERE( `n`.`adminid` = 7417 OR `f`.`post_adminid` = 7417 ) GROUP BY `f`.`flow_type`; |
执行超过2.5秒,post_adminid加索引,SQL也用不到索引,将or改写成 union all写法
select flow_type,sum(num) num from (SELECT f.flow_type, 1 num FROM ol_admin_flow f LEFT JOIN ol_admin_flow_node n ON f.flowid=n.flowid WHERE f.post_adminid = 7418 union all SELECT f.flow_type, 1 num FROM ol_admin_flow f LEFT JOIN ol_admin_flow_node n ON f.flowid=n.flowid WHERE n.adminid = 7418 ) a group by flow_type |
改成这样写法后。SQL效率大幅提升,查询只需0.3秒
案例4
SELECT au.GROUP group_id,sum(IF ( o.STATUS = 3 AND o.attach_pay_time BETWEEN 1625068800 AND 1627747199 AND o.rebate_time BETWEEN 1625068800 AND 1627747199, 0, oap.price ) ) real_renewal_money, sum(IF ( o.STATUS = 3 AND o.attach_pay_time BETWEEN 1625068800 AND 1627747199 AND o.rebate_time BETWEEN 1625068800 AND 1627747199, 0, oap.order_count ) ) renewal_num FROM `ol_order_attach_pay` `oap` INNER JOIN `ol_order` `o` IGNORE INDEX ( idx_pay_time_type ) ON `o`.`order_number` = `oap`.`order_number` AND `o`.`status` IN ( 1, 3 ) AND ( `o`.`attach_pay_time` >= 1625068800 AND `o`.`attach_pay_time` < 1627055999 ) AND `o`.`package_course_type` = 1 INNER JOIN `online`.`ol_admin_user` `au` ON `oap`.`adminid` = `au`.`id` WHERE `oap`.`status` = 1 AND `oap`.`adminid` IN ( 728, 818, 870, 1497, 2019, 2021, 2465, 2557, 2679, 3228, 3231, 3916, 3419, 3423, 3412, 3417, 3500, 4165, 477, 4163, 1030, 562 ) GROUP BY `au`.`group` LIMIT 100;
该SQL线上执行需要18秒以上,执行计划如下:
从执行计划看,慢的地方在查询ol_order的索引,有76万条。表中有以下索引:
KEY `attach_pay_time` (`attach_pay_time`) KEY `package_course_type` (`package_course_type`),表中有20多个索引,太多,真不想再建索引。但不加索引无法优化,用
force index 索引优化效果又不太好,一开始研发建立复合索引:KEY `idx_pay_time_type` (`attach_pay_time`,`package_course_type`), 按理应该优化,但实际优化效果不好,后续将复合索引改成:
KEY `idx_pay_time_type` (`package_course_type`,`attach_pay_time`),
加好后,执行计划如下,查询在0.5秒:
总结:
1,优化SQL,MySQL需要选择正确的驱动表,如果执行计划不正确,可用exists来明确驱动表