在前文 https://www.cnblogs.com/heyang78/p/12079017.html 中,我主要描述了现象,给出了结论,这次试图对过程进行一点思索。
需求:从配送表里,当订单号和配送者一样时,取时间最靠近现在即时间值最大的一条记录。
配送表表结构:
create table peisong(
id int,
order_no int, --订单号
shipper_cd int,--配送者号
create_time timestamp,
primary key(id))
填充实验值:
insert into peisong(id,order_no,shipper_cd,create_time) values('0',1,1,to_date('2004-05-07','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('1',1,1,to_date('2004-05-08','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('2',1,2,to_date('2004-05-09','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('3',1,2,to_date('2004-05-10','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('4',1,3,to_date('2004-05-11','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('5',2,1,to_date('2004-05-12','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('6',2,1,to_date('2004-05-13','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('7',2,2,to_date('2004-05-14','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('8',2,2,to_date('2004-05-15','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('9',2,3,to_date('2004-05-16','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('10',2,3,to_date('2004-05-17','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('11',3,1,to_date('2004-05-18','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('12',3,2,to_date('2004-05-19','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('13',3,2,to_date('2004-05-20','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('14',3,3,to_date('2004-05-21','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('15',3,4,to_date('2004-05-22','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('16',3,4,to_date('2004-05-23','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('17',3,4,to_date('2004-05-24','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('18',3,4,to_date('2004-05-25','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('19',3,4,to_date('2004-05-26','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('20',3,4,to_date('2004-05-27','yyyy-mm-dd'));
有多种方式可以达成需求,以下列举四种:
方案一:先分组再内联方案
select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd
或:
select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd
筛选结果:
ID ORDER_NO SHIPPER_CD CREATE_TIM ---------- ---------- ---------- ---------- 1 1 1 2004-05-08 3 1 2 2004-05-10 4 1 3 2004-05-11 6 2 1 2004-05-13 8 2 2 2004-05-15 10 2 3 2004-05-17 11 3 1 2004-05-18 13 3 2 2004-05-20 14 3 3 2004-05-21 20 3 4 2004-05-27 已选择10行。
如果我们把order_no,shipper_cd相同的记录划分成一组,那么由b表的形成条件可知,无论组内元素有多少条,最终只会选出create_time最大(即最近)的那条,两表内联查询便会在21*10=210的结果集选,因此这种查询方式适合组内数量稍多的情况,比较均衡。
方案二:左连接方案
SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd
或:
SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd
结果:
ID ORDER_NO SHIPPER_CD CREATE_TIM ---------- ---------- ---------- ---------- 1 1 1 2004-05-08 3 1 2 2004-05-10 4 1 3 2004-05-11 6 2 1 2004-05-13 8 2 2 2004-05-15 10 2 3 2004-05-17 11 3 1 2004-05-18 13 3 2 2004-05-20 14 3 3 2004-05-21 20 3 4 2004-05-27 已选择10行。
这个左联SQL的代码稍微有点费解,我在这里赘述两句。
首先可以看看peisong表左联自己,条件是左表时间小于右表的情况。
SQL:
SELECT
a.id,
to_char(a.create_time,'yyyy-mm-dd') as actime,
to_char(b.create_time,'yyyy-mm-dd') as bctime
from
peisong a left JOIN
peisong b
on
a.order_no = b.order_no and
a.shipper_cd = b.shipper_cd and
a.create_time < b.create_time
order by
a.id,
a.order_no,
a.shipper_cd
查询结果:
SQL> SELECT
2 a.id,
3 to_char(a.create_time,'yyyy-mm-dd') as actime,
4 to_char(b.create_time,'yyyy-mm-dd') as bctime
5 from
6 peisong a left JOIN
7 peisong b
8 on
9 a.order_no = b.order_no and
10 a.shipper_cd = b.shipper_cd and
11 a.create_time < b.create_time
12 order by
13 a.id,
14 a.order_no,
15 a.shipper_cd;
ID ACTIME BCTIME
---------- ---------- ----------
0 2004-05-07 2004-05-08
1 2004-05-08
2 2004-05-09 2004-05-10
3 2004-05-10
4 2004-05-11
5 2004-05-12 2004-05-13
6 2004-05-13
7 2004-05-14 2004-05-15
8 2004-05-15
9 2004-05-16 2004-05-17
10 2004-05-17
11 2004-05-18
12 2004-05-19 2004-05-20
13 2004-05-20
14 2004-05-21
15 2004-05-22 2004-05-23
15 2004-05-22 2004-05-26
15 2004-05-22 2004-05-25
15 2004-05-22 2004-05-27
15 2004-05-22 2004-05-24
16 2004-05-23 2004-05-24
16 2004-05-23 2004-05-27
16 2004-05-23 2004-05-26
16 2004-05-23 2004-05-25
17 2004-05-24 2004-05-27
17 2004-05-24 2004-05-26
17 2004-05-24 2004-05-25
18 2004-05-25 2004-05-27
18 2004-05-25 2004-05-26
19 2004-05-26 2004-05-27
20 2004-05-27
已选择31行。
由上面的查询结果可以看出来,当左表的create_time字段已经是最大时,右表里已经找不出order_no,shipper_cd相同,而create_time字段更大的记录,于是按连接规则补NULL,这个NULL在上面的查询中就是以空档显示的。
通过观察我们就能知道,左表acreate_time字段有数据,右表bcreate_time字段是空的情况,这就是我们想要的记录,于是筛选条件where b.create_time IS NULL就出现了。
让我们会看ab两表连接条件“on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time”,如果我们把order_no,shipper_cd相同的记录划到一组,会发现组越大,产生的连接结果就越多,最典型是15到20这一组(共5条记录),通过左联接产生了16条记录(5+4+3+2+1+1),如果有n条,那么就是n*(n+1)/2+1条,也就说说同组数量越大,产生记录会和数量平方成正比,这就是在原先实验中组小基本是一两条时左联接最快,组数量越大越慢的根本原因。综合分析可以得出结论,左联方案只是完成需求,对优化没有多考虑。
方案三:not exists方案
select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd
或:
select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd
结果:
ID ORDER_NO SHIPPER_CD CREATE_TIM ---------- ---------- ---------- ---------- 1 1 1 2004-05-08 3 1 2 2004-05-10 4 1 3 2004-05-11 6 2 1 2004-05-13 8 2 2 2004-05-15 10 2 3 2004-05-17 11 3 1 2004-05-18 13 3 2 2004-05-20 14 3 3 2004-05-21 20 3 4 2004-05-27
Not exists方式会形成n*n的结构,但右边的n只做条件不筛选,然后对左边的n过滤, 最适合按 order_no,shipper_cd分组,而组内元素较多>10的情况,和左联正好相反。
方案四:最好理解又高效的rank分析函数方案
Oracle提供了分析函数rank,它可以按order_no,shipper_cd分组,按create_time逆序排序,然后将组内顺序赋上顺序值。
像这样
select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong
之后我们只要取出seq=1的记录就达成需求了。
最终SQL:
select
id,
order_no,
shipper_cd,
to_char(create_time,'yyyy-mm-dd') as create_time
from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
order by id
执行结果:
SQL> select 2 id, 3 order_no, 4 shipper_cd, 5 to_char(create_time,'yyyy-mm-dd') as create_time 6 from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1) 7 order by id; ID ORDER_NO SHIPPER_CD CREATE_TIM ---------- ---------- ---------- ---------- 1 1 1 2004-05-08 3 1 2 2004-05-10 4 1 3 2004-05-11 6 2 1 2004-05-13 8 2 2 2004-05-15 10 2 3 2004-05-17 11 3 1 2004-05-18 13 3 2 2004-05-20 14 3 3 2004-05-21 20 3 4 2004-05-27 已选择10行。
这种方案由于rank的使用,第二个子查询就抛下了大多数无用数据行,因此效率也不错,还好理解,值得推荐。
可以看出,以上四种SQL文都能达成需求,具体选择哪种方案,需要根据实际数据分布进行实验,再选最合适的一种。
这次的分析应该比上次:https://www.cnblogs.com/heyang78/p/12079017.html 要更深入一些。
END