含有IN的子查询

本文主要探讨MySQL5.5和MYSQL5.6对包含in子查询的优化情况。

MySQL5.5

首先来看一下，在MySQL5.5及以下版本是如何处理的，比如下面这条语句：

select name from search_hash where info_hash in (select info_hash from search_filelist);

这种语句我们见的很多，熟悉mysql的人都会建议对其改写，但为什么呢？先看它的执行计划：

mysql> explain extended select name from search_hash where info_hash in (select info_hash from search_filelist);
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type        | table           | type            | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY            | search_hash     | ALL             | NULL          | NULL    | NULL    | NULL | 604435 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | search_filelist | unique_subquery | PRIMARY       | PRIMARY | 122     | func |      1 |   100.00 | Using index |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+

很奇怪吧，第二行显示查询的类型是相关子查询。这怎么就扯上相关子查询呢？再看一下优化器的执行情况：

mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                 |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `zsky`.`search_hash`.`name` AS `name` from `zsky`.`search_hash` where <in_optimizer>(`zsky`.`search_hash`.`info_hash`,<exists>(<primary_index_lookup>(<cache>(`zsky`.`search_hash`.`info_hash`) in search_filelist on PRIMARY))) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

MYSQL优化器会把这个查询改写为下面的相关子查询：
select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
不信看下面，两者的执行计划极其相似。

mysql> explain select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
| id | select_type        | table           | type   | possible_keys | key     | key_len | ref                        | rows   | Extra       |
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
|  1 | PRIMARY            | search_hash     | ALL    | NULL          | NULL    | NULL    | NULL                       | 604435 | Using where |
|  2 | DEPENDENT SUBQUERY | search_filelist | eq_ref | PRIMARY       | PRIMARY | 122     | zsky.search_hash.info_hash |      1 | Using index |
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+

改写后的查询会对search_hash表做全表扫描，然后用每一个info_hash值再去 search_filelist表查，如果外面这个表很大的话，就会很慢。比如例子中的表有60万行，也就是子查询要执行60万次。一般别人会建议改写为表连接，比如下面这样：
select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash;

mysql> explain select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash ;
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
| id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
|  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 144657 | Using index |
|  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | Using where |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+

这个时候是扫描整个search_filelist表，然后对每一个info_hash值去search_hash表中查。但为什么改写为表连接后就快些呢？
当然左表能够使用索引是一部分原因。此外需要注意到的是，优化器自动会选择记录较少的search_filelist作为左表。虽然都需要扫描整张表，但后者需要扫描的数据量少很多，执行的次数也少。一般情况下，in里面的子查询返回的结果集都会是主查询的子集，或者记录少于主查询，那么改写后的查询需要扫描的数量也就更少。

MySQL5.6

而在MySQL5.6中，优化器会自动将in 子查询改写为表连接。也就是说在MYSQL5.6及以上版本已经不需要手工优化包含in的子查询。下面是在MYSQL5.6中的的执行计划：

mysql> explain select name from search_hash where info_hash in (select info_hash from search_filelist);
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
| id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
|  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 110474 | Using index |
|  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | NULL        |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+

转载请注明出处：http://www.cnblogs.com/ayard/

相关阅读:
es6里面的arr方法
 for循环比较
 window.location各属性的值
 浏览器的缓存机制
 es6中的双箭头函数
 前端开发模拟数据------webpack-api-mocker
对象的深拷贝和浅拷贝
 Web应用程序的安全问题
 this指向问题
 postman使用篇最全整理
原文地址：https://www.cnblogs.com/ayard/p/9317507.html

最新文章
django框架(1)
api安全认证
 Web框架django[Form]组件
 Web框架django进阶篇
 Web框架django基础篇
 django2
Django
mysql
jQuery
CSS实现轮播效果