• 再度思索:从配送表中选出订单号和配送者相同时的最新记录


    在前文 https://www.cnblogs.com/heyang78/p/12079017.html 中,我主要描述了现象,给出了结论,这次试图对过程进行一点思索。

    需求:从配送表里,当订单号和配送者一样时,取时间最靠近现在即时间值最大的一条记录。

    配送表表结构:

    create table peisong(
    id int,
    order_no int, --订单号
    shipper_cd int,--配送者号
    create_time timestamp,
    primary key(id))

    填充实验值:

    insert into peisong(id,order_no,shipper_cd,create_time) values('0',1,1,to_date('2004-05-07','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('1',1,1,to_date('2004-05-08','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('2',1,2,to_date('2004-05-09','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('3',1,2,to_date('2004-05-10','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('4',1,3,to_date('2004-05-11','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('5',2,1,to_date('2004-05-12','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('6',2,1,to_date('2004-05-13','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('7',2,2,to_date('2004-05-14','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('8',2,2,to_date('2004-05-15','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('9',2,3,to_date('2004-05-16','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('10',2,3,to_date('2004-05-17','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('11',3,1,to_date('2004-05-18','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('12',3,2,to_date('2004-05-19','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('13',3,2,to_date('2004-05-20','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('14',3,3,to_date('2004-05-21','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('15',3,4,to_date('2004-05-22','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('16',3,4,to_date('2004-05-23','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('17',3,4,to_date('2004-05-24','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('18',3,4,to_date('2004-05-25','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('19',3,4,to_date('2004-05-26','yyyy-mm-dd'));
    insert into peisong(id,order_no,shipper_cd,create_time) values('20',3,4,to_date('2004-05-27','yyyy-mm-dd'));

    有多种方式可以达成需求,以下列举四种:

    方案一:先分组再内联方案

    select
        a.id,
        a.order_no,
        a.shipper_cd,
        to_char(a.create_time,'yyyy-mm-dd') as create_time
    from
        peisong a,
        (
        select
            order_no,
            shipper_cd,
            max(create_time) as create_time
        from
            peisong
        group by
            order_no,
            shipper_cd) b
    where
        a.order_no = b.order_no and 
        a.shipper_cd = b.shipper_cd and 
        a.create_time = b.create_time
    order by
        a.id,
        a.order_no,
        a.shipper_cd

    或:

    select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd

    筛选结果:

            ID   ORDER_NO SHIPPER_CD CREATE_TIM
    ---------- ---------- ---------- ----------
             1          1          1 2004-05-08
             3          1          2 2004-05-10
             4          1          3 2004-05-11
             6          2          1 2004-05-13
             8          2          2 2004-05-15
            10          2          3 2004-05-17
            11          3          1 2004-05-18
            13          3          2 2004-05-20
            14          3          3 2004-05-21
            20          3          4 2004-05-27
    
    已选择10行。

     如果我们把order_no,shipper_cd相同的记录划分成一组,那么由b表的形成条件可知,无论组内元素有多少条,最终只会选出create_time最大(即最近)的那条,两表内联查询便会在21*10=210的结果集选,因此这种查询方式适合组内数量稍多的情况,比较均衡。

    方案二:左连接方案

    SELECT
        a.id,
        a.order_no,
        a.shipper_cd,
        to_char(a.create_time,'yyyy-mm-dd') as create_time
    from
        peisong a left JOIN 
        peisong b
    on
        a.order_no = b.order_no and 
        a.shipper_cd = b.shipper_cd and 
        a.create_time < b.create_time
    where
        b.create_time  IS  NULL
    order by
        a.id,
        a.order_no,
        a.shipper_cd

    或:

    SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd

    结果:

            ID   ORDER_NO SHIPPER_CD CREATE_TIM
    ---------- ---------- ---------- ----------
             1          1          1 2004-05-08
             3          1          2 2004-05-10
             4          1          3 2004-05-11
             6          2          1 2004-05-13
             8          2          2 2004-05-15
            10          2          3 2004-05-17
            11          3          1 2004-05-18
            13          3          2 2004-05-20
            14          3          3 2004-05-21
            20          3          4 2004-05-27
    
    已选择10行。

    这个左联SQL的代码稍微有点费解,我在这里赘述两句。

    首先可以看看peisong表左联自己,条件是左表时间小于右表的情况。

    SQL:

    SELECT
        a.id,
        to_char(a.create_time,'yyyy-mm-dd') as actime,
        to_char(b.create_time,'yyyy-mm-dd') as bctime
    from
        peisong a left JOIN 
        peisong b
    on
        a.order_no = b.order_no and 
        a.shipper_cd = b.shipper_cd and 
        a.create_time < b.create_time
    order by
        a.id,
        a.order_no,
        a.shipper_cd

    查询结果:

    SQL> SELECT
      2      a.id,
      3      to_char(a.create_time,'yyyy-mm-dd') as actime,
      4      to_char(b.create_time,'yyyy-mm-dd') as bctime
      5  from
      6      peisong a left JOIN
      7      peisong b
      8  on
      9      a.order_no = b.order_no and
     10      a.shipper_cd = b.shipper_cd and
     11      a.create_time < b.create_time
     12  order by
     13      a.id,
     14      a.order_no,
     15      a.shipper_cd;
    
            ID ACTIME     BCTIME
    ---------- ---------- ----------
             0 2004-05-07 2004-05-08
             1 2004-05-08
             2 2004-05-09 2004-05-10
             3 2004-05-10
             4 2004-05-11
             5 2004-05-12 2004-05-13
             6 2004-05-13
             7 2004-05-14 2004-05-15
             8 2004-05-15
             9 2004-05-16 2004-05-17
            10 2004-05-17
            11 2004-05-18
            12 2004-05-19 2004-05-20
            13 2004-05-20
            14 2004-05-21
            15 2004-05-22 2004-05-23
            15 2004-05-22 2004-05-26
            15 2004-05-22 2004-05-25
            15 2004-05-22 2004-05-27
            15 2004-05-22 2004-05-24
            16 2004-05-23 2004-05-24
            16 2004-05-23 2004-05-27
            16 2004-05-23 2004-05-26
            16 2004-05-23 2004-05-25
            17 2004-05-24 2004-05-27
            17 2004-05-24 2004-05-26
            17 2004-05-24 2004-05-25
            18 2004-05-25 2004-05-27
            18 2004-05-25 2004-05-26
            19 2004-05-26 2004-05-27
            20 2004-05-27
    
    已选择31行。

    由上面的查询结果可以看出来,当左表的create_time字段已经是最大时,右表里已经找不出order_no,shipper_cd相同,而create_time字段更大的记录,于是按连接规则补NULL,这个NULL在上面的查询中就是以空档显示的。

    通过观察我们就能知道,左表acreate_time字段有数据,右表bcreate_time字段是空的情况,这就是我们想要的记录,于是筛选条件where b.create_time  IS  NULL就出现了。

    让我们会看ab两表连接条件“on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time”,如果我们把order_no,shipper_cd相同的记录划到一组,会发现组越大,产生的连接结果就越多,最典型是15到20这一组(共5条记录),通过左联接产生了16条记录(5+4+3+2+1+1),如果有n条,那么就是n*(n+1)/2+1条,也就说说同组数量越大,产生记录会和数量平方成正比,这就是在原先实验中组小基本是一两条时左联接最快,组数量越大越慢的根本原因。综合分析可以得出结论,左联方案只是完成需求,对优化没有多考虑。

    方案三:not exists方案

    select
        a.id,
        a.order_no,
        a.shipper_cd,
        to_char(a.create_time,'yyyy-mm-dd') as create_time
    from
        peisong a
    where
          not exists  (
        select
            1
        from
            peisong b
        where
            b.shipper_cd = a.shipper_cd and 
            b.order_no = a.order_no and 
            b.create_time > a.create_time)
    order by
        a.id,
        a.order_no,
        a.shipper_cd

    或:

    select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd

    结果:

            ID   ORDER_NO SHIPPER_CD CREATE_TIM
    ---------- ---------- ---------- ----------
             1          1          1 2004-05-08
             3          1          2 2004-05-10
             4          1          3 2004-05-11
             6          2          1 2004-05-13
             8          2          2 2004-05-15
            10          2          3 2004-05-17
            11          3          1 2004-05-18
            13          3          2 2004-05-20
            14          3          3 2004-05-21
            20          3          4 2004-05-27

    Not exists方式会形成n*n的结构,但右边的n只做条件不筛选,然后对左边的n过滤, 最适合按 order_no,shipper_cd分组,而组内元素较多>10的情况,和左联正好相反。

    方案四:最好理解又高效的rank分析函数方案

    Oracle提供了分析函数rank,它可以按order_no,shipper_cd分组,按create_time逆序排序,然后将组内顺序赋上顺序值。

    像这样

    select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong

    之后我们只要取出seq=1的记录就达成需求了。

    最终SQL:

    select
    id,
    order_no,
    shipper_cd,
    to_char(create_time,'yyyy-mm-dd') as create_time
    from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
    order by id

    执行结果:

    SQL> select
      2      id,
      3      order_no,
      4      shipper_cd,
      5      to_char(create_time,'yyyy-mm-dd') as create_time
      6  from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
      7  order by id;
    
            ID   ORDER_NO SHIPPER_CD CREATE_TIM
    ---------- ---------- ---------- ----------
             1          1          1 2004-05-08
             3          1          2 2004-05-10
             4          1          3 2004-05-11
             6          2          1 2004-05-13
             8          2          2 2004-05-15
            10          2          3 2004-05-17
            11          3          1 2004-05-18
            13          3          2 2004-05-20
            14          3          3 2004-05-21
            20          3          4 2004-05-27
    
    已选择10行。

    这种方案由于rank的使用,第二个子查询就抛下了大多数无用数据行,因此效率也不错,还好理解,值得推荐。

    可以看出,以上四种SQL文都能达成需求,具体选择哪种方案,需要根据实际数据分布进行实验,再选最合适的一种。

    这次的分析应该比上次:https://www.cnblogs.com/heyang78/p/12079017.html 要更深入一些。

    END

  • 相关阅读:
    c++ set unordered_set区别
    LeetCode 213. 打家劫舍 II
    LeetCode 152. 乘积最大子序列
    [HAOI 2012] 外星人
    [HAOI 2016] 找相同字符
    [ZJOI2007] 仓库建设
    [SCOI 2016] 美味
    [BZOJ 2127] Happiness
    [NOI2009] 植物大战僵尸
    [SDOI 2016] 数字配对
  • 原文地址:https://www.cnblogs.com/heyang78/p/15206050.html
Copyright © 2020-2023  润新知