本文主要探讨MySQL5.5和MYSQL5.6对包含in子查询的优化情况。
-
MySQL5.5
首先来看一下,在MySQL5.5及以下版本是如何处理的,比如下面这条语句:
select name from search_hash where info_hash in (select info_hash from search_filelist);
这种语句我们见的很多,熟悉mysql的人都会建议对其改写,但为什么呢?先看它的执行计划:
mysql> explain extended select name from search_hash where info_hash in (select info_hash from search_filelist); +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+ | 1 | PRIMARY | search_hash | ALL | NULL | NULL | NULL | NULL | 604435 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | search_filelist | unique_subquery | PRIMARY | PRIMARY | 122 | func | 1 | 100.00 | Using index | +----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
很奇怪吧,第二行显示查询的类型是相关子查询。这怎么就扯上相关子查询呢?再看一下优化器的执行情况:
mysql> show warnings; +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `zsky`.`search_hash`.`name` AS `name` from `zsky`.`search_hash` where <in_optimizer>(`zsky`.`search_hash`.`info_hash`,<exists>(<primary_index_lookup>(<cache>(`zsky`.`search_hash`.`info_hash`) in search_filelist on PRIMARY))) | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
MYSQL优化器会把这个查询改写为下面的相关子查询:
select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
不信看下面,两者的执行计划极其相似。
mysql> explain select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash); +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+ | 1 | PRIMARY | search_hash | ALL | NULL | NULL | NULL | NULL | 604435 | Using where | | 2 | DEPENDENT SUBQUERY | search_filelist | eq_ref | PRIMARY | PRIMARY | 122 | zsky.search_hash.info_hash | 1 | Using index | +----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
改写后的查询会对search_hash表做全表扫描,然后用每一个info_hash值再去 search_filelist表查,如果外面这个表很大的话,就会很慢。比如例子中的表有60万行,也就是子查询要执行60万次。一般别人会建议改写为表连接,比如下面这样:
select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash;
mysql> explain select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash ; +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+ | 1 | SIMPLE | search_filelist | index | PRIMARY | PRIMARY | 122 | NULL | 144657 | Using index | | 1 | SIMPLE | search_hash | ref | info_hash | info_hash | 123 | zsky.search_filelist.info_hash | 1 | Using where | +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
这个时候是扫描整个search_filelist表,然后对每一个info_hash值去search_hash表中查。但为什么改写为表连接后就快些呢?
当然左表能够使用索引是一部分原因。此外需要注意到的是,优化器自动会选择记录较少的search_filelist作为左表。虽然都需要扫描整张表,但后者需要扫描的数据量少很多,执行的次数也少。一般情况下,in里面的子查询返回的结果集都会是主查询的子集,或者记录少于主查询,那么改写后的查询需要扫描的数量也就更少。
-
MySQL5.6
而在MySQL5.6中,优化器会自动将in 子查询改写为表连接。也就是说在MYSQL5.6及以上版本已经不需要手工优化包含in的子查询。下面是在MYSQL5.6中的的执行计划:
mysql> explain select name from search_hash where info_hash in (select info_hash from search_filelist); +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+ | 1 | SIMPLE | search_filelist | index | PRIMARY | PRIMARY | 122 | NULL | 110474 | Using index | | 1 | SIMPLE | search_hash | ref | info_hash | info_hash | 123 | zsky.search_filelist.info_hash | 1 | NULL | +----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+