从多表连接后的select count(*)看待SQL优化
一朋友问我,以下这SQL能直接改写成select count(*) from a
吗?
SELECT COUNT(*) FROM a LEFT JOIN b ON a.a1 = b.b1 LEFT JOIN c ON b.b1 = c.c1
废话不多说,直接上实验。
1. 准备数据
创建测试表a,b,c
,并插入数据,a有重复数据,b是唯一数据,c是唯一数据,d有重复数据。
1) 创建a表
create table a (a1 int); insert into a select 1; insert into a select 2; insert into a select 3; insert into a select 1; insert into a select 2; insert into a select 3; insert into a values(null); insert into a values(null); insert into a values(null); insert into a values(null);
2)创建b表
create table b (b1 int); insert into b select 1; insert into b select 2; insert into b select 3; insert into b select 4; insert into b select 5;
3)创建c表
create table c (c1 int); insert into c select 7; insert into c select 8; insert into c select 9; insert into c values(null); insert into c values(null);
4)创建d表
create table d (d1 int); insert into d select 1; insert into d select 1; insert into d select 1; insert into d select 1; insert into d select 1; insert into d select 1;
2. 数据查看
a表 | b表 | c表 | d表 |
---|---|---|---|
1 | 1 | 7 | 1 |
2 | 2 | 8 | 1 |
3 | 3 | 9 | 1 |
1 | 4 | null | 1 |
2 | 5 | null | 1 |
3 | 1 | ||
null | |||
null | |||
null | |||
null |
3. SQL示例
3.1 a表连接b表再连接c表(N:1:1
的关系)
a表连接列有重复数据,b,c两表的连接列都是唯一数据
SELECT COUNT(*) FROM a LEFT JOIN b ON a.a1 = b.b1 LEFT JOIN c ON b.b1 = c.c1 +----------+ | COUNT(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec)
返回的10条数据
此时SQL只返回a表的数据,那么这时候SQL可以改写成
mysql> select count(*) from a; +----------+ | count(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec)
3.2 b表连接a表再连接c表(1:N:1
的关系)
SELECT count(*) FROM b LEFT JOIN a ON b.b1 = a.a1 LEFT JOIN c ON a.a1 = c.c1 +----------+ | count(*) | +----------+ | 8 | +----------+ 1 row in set (0.00 sec)
原本b表是5条数据,left join后变为8条,此时就不能改写成上述形式了,我们来看下,具体数据是什么。
+------+------+------+ | b1 | a1 | c1 | +------+------+------+ | 1 | 1 | NULL | | 2 | 2 | NULL | | 3 | 3 | NULL | | 1 | 1 | NULL | | 2 | 2 | NULL | | 3 | 3 | NULL | | 4 | NULL | NULL | | 5 | NULL | NULL | +------+------+------+ 8 rows in set (0.00 sec)
可以看到a表的重复数据,在b表重复展现了,c表与a表连接,没有相等的数据(null不等于null)所以c1列展现都为null值。
这时候此SQL可以等价于以下:
SELECT count(*) FROM b LEFT JOIN a ON b.b1 = a.a1; +----------+ | count(*) | +----------+ | 8 | +----------+ 1 row in set (0.00 sec)
3.3 a表与d表相连接(N:N关系)
SELECT * FROM a LEFT JOIN d ON a.a1 =d.d1; +------+------+ | a1 | d1 | +------+------+ | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 1 | 1 | | 2 | NULL | | 3 | NULL | | 2 | NULL | | 3 | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | +------+------+ 20 rows in set (0.00 sec)
可以看a表a1列数据组成是 a表2个1
* b表 6个1
= 12个1
,再加上原本a1列的数据8条,总共20条数据。
4. 总结
从以上实验可以延伸到,如果连接列基数很低,此时left join就相当于笛卡儿积。。
所以在做SQL优化时候,尤其需要关注连接列的基数,与表与表之间的关系。