• [oracle/sql]关于清除重复,not in方案和not exists方案的对比


    有这样一张表:

    CREATE TABLE tb_sc
    (
        id NUMBER not null primary key,
        studentid int not null,
        courseid int not null,
        score int not null
    )

    用以下语句给它充值十万条数据:

    Insert into tb_sc
    select rownum,dbms_random.value(0,10000),dbms_random.value(1,5),dbms_random.value(0,150) from dual
    connect by level<=100000
    order by dbms_random.random

    当然上面这样填充完会给同一studentid和courseid时不同的score记录,相当于学生同一科目考了两次,这在现实中是不合理的,因此我们要剔除掉,只保留同一studentid和courseid时score最高的那条记录。

    用下面的sql能查询出该保留的记录:

    select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid

    如果直接去删除不在上面的sql查出的结果集中的记录,会写出如下的sql:

    delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)

    但是,真执行起来就会发现,要等这条delete语句执行完简直遥遥无期。

    而用同为反连接的not exist做就很快:

    delete from tb_sc where not exists (
    select 'x' from tb_sc a,
                     (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) b
    where a.studentid=b.studentid and a.courseid=b.courseid and a.score=b.score and tb_sc.id=a.id)
    SQL> delete from tb_sc where not exists (
      2  select 'x' from tb_sc a,
      3                   (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) b
      4  where a.studentid=b.studentid and a.courseid=b.courseid and a.score=b.score and tb_sc.id=a.id);
    
    已删除58032行。
    
    已用时间:  00: 00: 00.75

    到这里很多人可能直接否决的not in,直接认为它慢,放弃了,但是看看下面sql,它会跑多久呢?

    delete from tb_sc where id not in (select tb_sc.id from tb_sc,( select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid ) tb_sc2
    where tb_sc.studentid=tb_sc2.studentid and tb_sc.courseid=tb_sc2.courseid and tb_sc.score=tb_sc2.score)

    让我们实际跑一下:

    SQL> delete from tb_sc where id not in (select tb_sc.id from tb_sc,( select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid ) tb_sc2
      2  where tb_sc.studentid=tb_sc2.studentid and tb_sc.courseid=tb_sc2.courseid and tb_sc.score=tb_sc2.score);
    
    已删除58032行。
    
    已用时间:  00: 00: 00.56

    发现和上面的not exist差不多!你是不是又恢复对not in的信心了呢?

    但是

    delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)

    确实是慢得让人发指,而将delete换成select之后,却并不慢,不信大家请执行下面sql:

    select a.* from tb_sc a where (a.studentid,a.courseid,a.score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid)

    我执行的结果是:

         19563       9998          2          6
         27799       9998          3         95
    
            ID  STUDENTID   COURSEID      SCORE
    ---------- ---------- ---------- ----------
         37515       9998          4         60
         35809       9998          4        144
         65663       9998          5         64
         84961       9999          3          2
         24730       9999          3         14
         99371       9999          3         16
          4349      10000          5          2
    
    已选择58032行。
    
    已用时间:  00: 00: 16.01

    并没有多长时间,为什么同样的条件,查询就快,而删除就慢得不要不要的,其中的原因还有待继续研究。

    --2020年1月24日--

  • 相关阅读:
    E
    D
    Npp ChangeLog
    c++ 书籍(zz)
    再好的工作是为了更好的生活
    如何将JPG格式的图片转换成PNG格式
    点分治
    团体程序设计天梯赛(CCCC) L3021 神坛 的一些错误做法(目前网上的方法没一个是对的) 和 一些想法
    团体程序设计天梯赛(CCCC) L3019 代码排版 方法与编译原理密切相关,只有一个测试点段错误
    团体程序设计天梯赛(CCCC) L3015 球队“食物链” 状态压缩
  • 原文地址:https://www.cnblogs.com/heyang78/p/12232485.html
Copyright © 2020-2023  润新知