学习自《剑破冰山 Oracle开发艺术》第五章 报表开发之扩展GROUP BY
对于简单group by语句很难对复杂维度进行分析,难以达到实际生产的复杂报表需求,group by的扩展特性就需要了,union语句也可以达到需求但是sql复杂且效率低
1 rollup多维汇总
rollup,分组先进行常规分组,然后在此基础上,通过将列从右向左移动,然后进行更高一级的小计,最后合计,注意rollup分组和列的顺序相关
指定n列,有n+1种分组方式
部分rollup可以剔除某些不需要的小计和合计
例子
[oracle@localhost ~]$ sqlplus scott/tiger; SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 23 10:31:24 2020 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 10:31:24 SCOTT@edw> set autotrace on 10:31:30 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 RESEARCH 10875 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 29025 13 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3067950682 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 392 | 7 (29)| 00:00:01 | | 1 | SORT GROUP BY ROLLUP | | 14 | 392 | 7 (29)| 00:00:01 | | 2 | MERGE JOIN | | 14 | 392 | 6 (17)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 4 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 5 | SORT JOIN | | 14 | 210 | 4 (25)| 00:00:01 | | 6 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("A"."DEPTNO"="B"."DEPTNO") filter("A"."DEPTNO"="B"."DEPTNO") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 913 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 13 rows processed 10:31:34 SCOTT@edw>
可以看出仅仅dept和emp表均仅扫描一次,而如果是union来写就会多次重复扫描,效率低
通过执行计划看到有个隐藏操作SORT GROUP BY ROLLUP ,显示结果有序,一般还是要显示排序的,默认的排序不一定符合业务需求
rollup分组具有方向性
如果使用hint:expand_gset_to_union,则优化器会将rollup转换为对应的union all操作,其他的grouping sets、cube也可以
部分rollup分组,将不需要小计的列从rollup拿出到group by中即可,当然合计也没有了
例子
10:31:34 SCOTT@edw> set autotrace off 10:43:49 SCOTT@edw> SELECT to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY to_char(b.hiredate,'yyyy'),a.dname,ROLLUP(b.job); HIRE DNAME JOB SUM_SAL ---- -------------- --------- ---------- 1980 RESEARCH CLERK 800 1980 RESEARCH 800 1981 SALES CLERK 950 1981 SALES MANAGER 2850 1981 SALES SALESMAN 5600 1981 SALES 9400 1981 RESEARCH ANALYST 3000 1981 RESEARCH MANAGER 2975 1981 RESEARCH 5975 1981 ACCOUNTING MANAGER 2450 1981 ACCOUNTING PRESIDENT 5000 1981 ACCOUNTING 7450 1982 ACCOUNTING CLERK 1300 1982 ACCOUNTING 1300 1987 RESEARCH CLERK 1100 1987 RESEARCH ANALYST 3000 1987 RESEARCH 4100 17 rows selected. Elapsed: 00:00:00.01 10:43:53 SCOTT@edw>
2 cube交叉报表
cube分组可以实现更精细复杂的统计,对不同维度的所以可能进行分析,生成交叉报表,cube分组,是从n列中先进行合计,即一个列不取,然后小计,即取1列到n-1列,最后n列全取,即标准分组
因为包含所有可能的组合,所以结果与列的顺序无关,列顺序仅仅影响默认的隐藏排序而已,如果用了显示排序则无所谓了
cube分组增加一列,可能结果是指数级的增长,分组种类2的n次方
语法类似,例子
11:02:40 SCOTT@edw> set autotrace on 11:02:48 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY CUBE(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 18 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2382666110 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 392 | 7 (29)| 00:00:01 | | 1 | SORT GROUP BY | | 14 | 392 | 7 (29)| 00:00:01 | | 2 | GENERATE CUBE | | 14 | 392 | 7 (29)| 00:00:01 | | 3 | SORT GROUP BY | | 14 | 392 | 7 (29)| 00:00:01 | | 4 | MERGE JOIN | | 14 | 392 | 6 (17)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 6 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 7 | SORT JOIN | | 14 | 210 | 4 (25)| 00:00:01 | | 8 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("A"."DEPTNO"="B"."DEPTNO") filter("A"."DEPTNO"="B"."DEPTNO") Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 1175 bytes sent via SQL*Net to client 535 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 18 rows processed 11:02:52 SCOTT@edw>
可以看执行计划,结果也是有序的
部分cube分组,例子
11:06:24 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,CUBE(b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 12 rows selected. Elapsed: 00:00:00.00 11:06:26 SCOTT@edw>
3 grouping sets实现小计
rollup和cube会产生各种标准分组、小计、合计,grouping sets则只关注指定维度的小计,n列的结果也是n种
如grouping sets(a,b,c)就是group by a、group by b和group by c的结果union all
例子
11:06:26 SCOTT@edw> set autotrace on 11:12:33 SCOTT@edw> SELECT to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING SETS( to_char(b.hiredate,'yyyy'),a.dname,b.job); HIRE DNAME JOB SUM_SAL ---- -------------- --------- ---------- CLERK 4150 SALESMAN 5600 PRESIDENT 5000 MANAGER 8275 ANALYST 6000 ACCOUNTING 8750 RESEARCH 10875 SALES 9400 1987 4100 1980 800 1982 1300 1981 22825 12 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2825031421 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 14 | 448 | 17 (24)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D660D_29B9BB | | | | | | 3 | MERGE JOIN | | 14 | 504 | 6 (17)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 5 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 6 | SORT JOIN | | 14 | 322 | 4 (25)| 00:00:01 | | 7 | TABLE ACCESS FULL | EMP | 14 | 322 | 3 (0)| 00:00:01 | | 8 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 9 | HASH GROUP BY | | 5 | 60 | 3 (34)| 00:00:01 | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 168 | 2 (0)| 00:00:01 | | 11 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 12 | HASH GROUP BY | | 4 | 56 | 3 (34)| 00:00:01 | | 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 196 | 2 (0)| 00:00:01 | | 14 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 15 | HASH GROUP BY | | 1 | 8 | 3 (34)| 00:00:01 | | 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 112 | 2 (0)| 00:00:01 | | 17 | VIEW | | 5 | 160 | 2 (0)| 00:00:01 | | 18 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660E_29B9BB | 5 | 60 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("SYS_TBL_$2$"."DEPTNO"="SYS_TBL_$1$"."DEPTNO") filter("SYS_TBL_$2$"."DEPTNO"="SYS_TBL_$1$"."DEPTNO") Statistics ---------------------------------------------------------- 23 recursive calls 33 db block gets 39 consistent gets 4 physical reads 2172 redo size 962 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 12 rows processed 11:12:36 SCOTT@edw>
执行计划可以看出,没有默认排序了,无序,和列的顺序也无关
同理部分grouping sets分组,例子
11:12:36 SCOTT@edw> set autotrace off 11:17:03 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,GROUPING SETS(to_char(b.hiredate,'yyyy'),b.job); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES MANAGER 2850 SALES CLERK 950 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING CLERK 1300 RESEARCH MANAGER 2975 SALES SALESMAN 5600 RESEARCH ANALYST 6000 RESEARCH CLERK 1900 RESEARCH 1981 5975 SALES 1981 9400 RESEARCH 1987 4100 ACCOUNTING 1981 7450 ACCOUNTING 1982 1300 RESEARCH 1980 800 15 rows selected. Elapsed: 00:00:00.01 11:17:05 SCOTT@edw>
注意此时的含义有较大的变化
cube、rollup作为grouping sets的参数
grouping sets只提供单列分组,没有合计功能,如果需要提供合计,则可以将rollup或cube作为参数,例子
11:23:59 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING sets(rollup(a.dname),ROLLUP(b.job)); DNAME JOB SUM_SAL -------------- --------- ---------- CLERK 4150 SALESMAN 5600 PRESIDENT 5000 MANAGER 8275 ANALYST 6000 ACCOUNTING 8750 RESEARCH 10875 SALES 9400 29025 29025 10 rows selected. Elapsed: 00:00:00.02 11:24:02 SCOTT@edw>
问题是产生了两个合计行,因为rollup或cube作为grouping sets参数,相当于每个rollup或cube操作的union all,等价于这就很好理解功能了
对于重复合计,使用distinct剔除即可,另外后面还有特殊的函数可以使用,group_id可以用来剔除重复分组(和distinct功能是不一样的)
rollup和cube作为参数也可以混用,而且也可以使用其它扩展功能,如部分分组、复合列分组、连接分组等
rollup和cube不能接受grouping sets作为参数,rollup和cube互相作为参数也不行
4 组合列分组、连接分组、重置列分组
组合列分组、连接分组在复杂报表中用处很大。组合列分组用于剔除不必要的小计保留合计,连接分组按每个分组的笛卡尔积进行操作,分组更多更细。对于常规分组满足不了的需求可以考虑
组合列即将多个列当做整体对待,下列对比表可以清晰展示不同之处
连接分组更强大,允许group by后出现多个rollup、cube和grouping sets操作,这样分组级别更多,报表更精细,实现很复杂的需求实际上不管是同类型的连接分组还是不通类型的连接分组之间,最后的分组级别种类都是每个扩展分组级别种类的乘积,分组级别是笛卡尔积,比如rollup(a,b),rollup(c),最终3*2=6中分组级别
重复列分组也就是group by中允许重复列,比如group by rollup(a,(a,b))、group by a,rollup(a,b)
组合列分组
例子
14:48:13 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY rollup(a.dname,(to_char(b.hiredate,'yyyy'),b.job)); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES 1981 CLERK 950 SALES 1981 MANAGER 2850 SALES 1981 SALESMAN 5600 SALES 9400 RESEARCH 1980 CLERK 800 RESEARCH 1981 ANALYST 3000 RESEARCH 1981 MANAGER 2975 RESEARCH 1987 CLERK 1100 RESEARCH 1987 ANALYST 3000 RESEARCH 10875 ACCOUNTING 1981 MANAGER 2450 ACCOUNTING 1981 PRESIDENT 5000 ACCOUNTING 1982 CLERK 1300 ACCOUNTING 8750 29025 15 rows selected. Elapsed: 00:00:00.00 14:48:16 SCOTT@edw>
组合列分组可以实现部分rollup和部分cube分组类似效果并且加上合计
但是这个也比较麻烦,对于需要cube、rollup合计并剔除部分小计的需求用grouping_id或grouping函数
cube和rollup均可以转换为对应的grouping sets
当然反向也可以,不过意义不大
连接分组
例子
14:48:16 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY rollup(a.dname,b.job),ROLLUP(to_char(b.hiredate,'yyyy')); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 RESEARCH 10875 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 29025 RESEARCH 1980 CLERK 800 RESEARCH 1980 800 1980 800 SALES 1981 CLERK 950 SALES 1981 MANAGER 2850 SALES 1981 SALESMAN 5600 SALES 1981 9400 RESEARCH 1981 ANALYST 3000 RESEARCH 1981 MANAGER 2975 RESEARCH 1981 5975 ACCOUNTING 1981 MANAGER 2450 ACCOUNTING 1981 PRESIDENT 5000 ACCOUNTING 1981 7450 1981 22825 ACCOUNTING 1982 CLERK 1300 ACCOUNTING 1982 1300 1982 1300 RESEARCH 1987 CLERK 1100 RESEARCH 1987 ANALYST 3000 RESEARCH 1987 4100 1987 4100 34 rows selected. Elapsed: 00:00:00.01 14:57:57 SCOTT@edw>
相当于两个rollup的笛卡尔积
理解了之后,利用连接分组,cube可以用rollup转换,如cube(a,b,c)等于rollup(a),rollup(b),rollup(c),但是对于rollup和grouping sets转换为cube一般没啥用
连接分组一般是同类型的,不通类型的连接分组一般不常用
重复列分组
例子
14:57:57 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,ROLLUP(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 SALES 9400 RESEARCH 10875 ACCOUNTING 8750 SALES 9400 RESEARCH 10875 ACCOUNTING 8750 15 rows selected. Elapsed: 00:00:00.00 15:07:14 SCOTT@edw>
没啥意义的例子,只不过说明语法允许
5 三个扩展分组函数:grouping、grouping_id、group_id
三个扩展分组函数:grouping、grouping_id、group_id在生成有意义的报表、结果进行过滤、排序中有很重要的作用,常用于复杂的报表查询
注意grouping和grouping_id函数的参数不能是组合列
grouping函数用于制作有意义的报表
grouping_id函数对结果过滤以及排序
group_id函数剔除重复行
grouping函数
在扩展group by子句来说,null表示小计或者合计,但是如果数据中本来就有null值呢?grouping函数专门处理扩展group by分组中null问题:
它只接受一个参数,且参数来自rollup、cube、grouping sets中的列。当然也可以在group by而不在上述3个子句的列,不过结果肯定是0,没有意义
grouping函数对于小计或合计的列返回1,否则返回0。用于区别是否原始数据中含null,常与decode一起使用。当然也可以确定分组级别从而过滤一些行,不过会很烦,一般用grouping_id替代
例子
15:34:01 SCOTT@edw> SELECT decode(GROUPING(a.dname),1,'全部部门',a.dname) dname,decode(grouping(b.mgr),1,'全部老板',b.mgr) mgr,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.mgr); DNAME MGR SUM_SAL -------------- ---------------------------------------- ---------- SALES 7698 6550 SALES 7839 2850 SALES 全部老板 9400 RESEARCH 7566 6000 RESEARCH 7788 1100 RESEARCH 7839 2975 RESEARCH 7902 800 RESEARCH 全部老板 10875 ACCOUNTING 5000 ACCOUNTING 7782 1300 ACCOUNTING 7839 2450 ACCOUNTING 全部老板 8750 全部部门 全部老板 29025 13 rows selected. Elapsed: 00:00:00.01 15:34:12 SCOTT@edw>
grouping_id函数
用于过滤分组级别和排序结果
可以接受多个参数,来自rollup、cube、grouping sets中的列,按列从左往右顺序计算,是分组列则0,是小计或合计列为1,然后组合成为一个二进制数字叫做位向量,位向量转化为10进制即最后的结果,代表分组级别,如cube(a,b),那么grouping_id(a,b)代表的如下
grouping_id的好处是可以对多列进行计算得到分组级别
例子
15:46:26 SCOTT@edw> SELECT a.dname,b.mgr,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.mgr,b.job) HAVING grouping_id(a.dname,b.mgr,b.job) IN (0,7); DNAME MGR JOB SUM_SAL -------------- ---------- --------- ---------- SALES 7698 CLERK 950 SALES 7698 SALESMAN 5600 SALES 7839 MANAGER 2850 RESEARCH 7566 ANALYST 6000 RESEARCH 7788 CLERK 1100 RESEARCH 7839 MANAGER 2975 RESEARCH 7902 CLERK 800 ACCOUNTING PRESIDENT 5000 ACCOUNTING 7782 CLERK 1300 ACCOUNTING 7839 MANAGER 2450 29025 11 rows selected. Elapsed: 00:00:00.00 15:46:29 SCOTT@edw>
group_id函数
group_id无参数,因为扩展group by子句允许多种复杂分组操作,有时候为了实现复杂报表,可能出现重复统计,而group_id函数可以区分重复分组结果,第一次出现为0,以后每次出现增1,group_id在select中出现没啥意义,通常用于having子句剔除重复统计
例子
15:46:29 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal,group_id() gi FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING SETS(ROLLUP(a.dname),ROLLUP(b.job)) HAVING group_id()=0; DNAME JOB SUM_SAL GI -------------- --------- ---------- ---------- CLERK 4150 0 SALESMAN 5600 0 PRESIDENT 5000 0 MANAGER 8275 0 ANALYST 6000 0 ACCOUNTING 8750 0 RESEARCH 10875 0 SALES 9400 0 29025 0 9 rows selected. Elapsed: 00:00:00.01 15:55:55 SCOTT@edw>