本文是受网文 《一次非常有意思的SQL优化经历:从30248.271s到0.001s》启发而产生的。
网文没讲创建表的数据过程,我帮他给出。
创建科目表及数据:
CREATE TABLE tb_course ( id NUMBER not null primary key, name NVARCHAR2(10) not null ) Insert into tb_course select rownum,dbms_random.string('*',dbms_random.value(6,10)) from dual connect by level<=100 order by dbms_random.random
创建学生表及数据:
CREATE TABLE tb_student ( id NUMBER not null primary key, name NVARCHAR2(20) not null ) Insert into tb_student select rownum,dbms_random.string('*',dbms_random.value(15,20)) from dual connect by level<=70000 order by dbms_random.random
创建成绩表及数据:
CREATE TABLE tb_score ( id NUMBER not null primary key, studentid int not null, courseid int not null, score int not null ) Insert into tb_score select rownum,dbms_random.value(0,70000),dbms_random.value(0,100),dbms_random.value(0,100) from dual connect by level<=700000 order by dbms_random.random
而要取的考id=0的科目及格的学生列表,可以用下面两种等效半连接SQL语句:
select * from tb_student stu where stu.id in(select studentid from tb_score where courseid=0 and score>60) select * from tb_student stu where exists(select studentid from tb_score where courseid=0 and score>60 and tb_score.studentid=stu.id)
如果是要走内连接的方式,则要把tb_score表的studentid清除一次重复,如果不清重复,那么两表连接如果有一对多的情况就会产生多条数据。上面的半连接只要存在就算条件通过,存在一条和存在多条等效,自然就不用清除重复了。
select stu.* from (select distinct studentid from tb_score where courseid=0 and score>60) score, tb_student stu where score.studentid=stu.id select stu.* from tb_student stu, (select distinct studentid from tb_score where courseid=0 and score>60) score where score.studentid=stu.id
要找出重复元素可以采用下面sql:
SQL> select studentid,count(studentid) from 2 (select studentid from tb_score where courseid=0 and score>60) score 3 group by studentid 4 having count(studentid)>1; STUDENTID COUNT(STUDENTID) ---------- ---------------- 9508 2 55358 2 10852 2 55731 2 5751 2 503 2 已选择6行。 已用时间: 00: 00: 00.01
但这样还是不直观,于是可以查查重复记录究竟是怎么产生的:
SQL> select * from tb_score where studentid in (select studentid from 2 (select studentid from tb_score where courseid=0 and score>60) score 3 group by studentid 4 having count(studentid)>1) 5 and courseid=0 6 order by 2,3,4; ID STUDENTID COURSEID SCORE ---------- ---------- ---------- ---------- 173720 503 0 92 64695 503 0 94 157901 5751 0 71 475290 5751 0 93 144229 9508 0 67 142179 9508 0 89 240689 10852 0 73 625426 10852 0 75 203725 55358 0 86 431998 55358 0 100 83002 55731 0 68 356457 55731 0 83 已选择12行。 已用时间: 00: 00: 00.03
这下看清楚,原来有些人考了两次! 这可以对应现实中考两次取最高分或是正考一次补考一次的例子。
因为上面四条SQL语句运行都挺快,我都没加索引都是如此,于是就先不用优化了,以后再说。
2020年1月21日
附:上面我用到的全部SQL语句:
CREATE TABLE tb_course ( id NUMBER not null primary key, name NVARCHAR2(10) not null ) Insert into tb_course select rownum,dbms_random.string('*',dbms_random.value(6,10)) from dual connect by level<=100 order by dbms_random.random CREATE TABLE tb_student ( id NUMBER not null primary key, name NVARCHAR2(20) not null ) Insert into tb_student select rownum,dbms_random.string('*',dbms_random.value(15,20)) from dual connect by level<=70000 order by dbms_random.random CREATE TABLE tb_score ( id NUMBER not null primary key, studentid int not null, courseid int not null, score int not null ) Insert into tb_score select rownum,dbms_random.value(0,70000),dbms_random.value(0,100),dbms_random.value(0,100) from dual connect by level<=700000 order by dbms_random.random select * from tb_student stu where stu.id in(select studentid from tb_score where courseid=0 and score>60) select * from tb_student stu where exists(select studentid from tb_score where courseid=0 and score>60 and tb_score.studentid=stu.id) select stu.* from (select distinct studentid from tb_score where courseid=0 and score>60) score, tb_student stu where score.studentid=stu.id select stu.* from tb_student stu, (select distinct studentid from tb_score where courseid=0 and score>60) score where score.studentid=stu.id select studentid,count(studentid) from (select studentid from tb_score where courseid=0 and score>60) score group by studentid having count(studentid)>1 select * from tb_score where studentid in (select studentid from (select studentid from tb_score where courseid=0 and score>60) score group by studentid having count(studentid)>1) and courseid=0 order by 2,3,4
2020-01-22