Oracle "HASH GROUP BY"和"SORT GROUP BY"区别以及"无法使用"HASH GROUP BY"的情况
10G以前GROUP BY子句可以返回排序的结果集,即使没有ORDER BY子句。
原因是因为使用了“SORT GROUP BY”,会自动排序分组字段。
从10G开始以后引入了“HASH GROUP BY”,新的内部排序算法会导致GROUP BY 子句不保证输出会按分组的列排序,也不保证结果集的顺序。
要对分组进行排序,请使用 ORDER BY 子句。
如果未指定 ORDER BY 子句,则检索行的顺序取决于用于从数据库检索行的方法。换句话说,这取决于选择的执行计划。
下边看下简单的实验:
环境:19.13.0.0.0
创建表并插入实验数据,尽量保证同一会话插入数据保证数据看起来就是无序的,当然实际上也是:
create table zkm (id int,name varchar2(20)); insert into zkm values(1,'a'); insert into zkm values(2,'b'); insert into zkm values(3,'c'); insert into zkm values(9,'i'); insert into zkm values(5,'e'); insert into zkm values(4,'d'); insert into zkm values(8,'h'); insert into zkm values(7,'g'); insert into zkm values(6,'f'); commit;
目标SQL:select id,count(name) from zkm group by id;
参数设置:alter session set statistics_level=all;
使用Hint:NO_USE_HASH_AGGREGATION来禁用“HASH GROUP BY”,这样目标SQL执行后结果集总是按照ID列进行排序返回。
并且从执行计划看是“SORT GROUP BY”。
17:06:46 ZKM@dev-app73/pdb(9)> select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 9 rows selected. Elapsed: 00:00:00.01 17:06:47 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------- SQL_ID a7kukqrrrvrra, child number 1 ------------------------------------- select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:00.01 | 6 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 9 |00:00:00.01 | 6 | | | | ---------------------------------------------------------------------------------------------------------------- 15 rows selected. Elapsed: 00:00:00.06
去掉Hint后,再次执行返回的结果集则是无序的。
并且从执行计划看是“HASH GROUP BY”。
17:09:33 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 6 1 1 1 7 1 2 1 8 1 5 1 4 1 3 1 9 1 9 rows selected. Elapsed: 00:00:00.01 17:09:34 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------- SQL_ID dqw15j89d8r1b, child number 2 ------------------------------------- select id,count(name) from zkm group by id Plan hash value: 201225912 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:00.01 | 6 | | | | | 1 | HASH GROUP BY | | 1 | 9 | 9 |00:00:00.01 | 6 | 1558K| 1558K| 1063K (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 9 |00:00:00.01 | 6 | | | | ---------------------------------------------------------------------------------------------------------------- 14 rows selected. Elapsed: 00:00:00.10
从排序内存使用大小看,“HASH GROUP BY”使用的内存为1063K,“SORT GROUP BY”为2048bytes。
也可以从v$sql_workarea.last_memory_used获取信息。
由于数据量比较小,构造大量数据后执行速度为:
17:26:32 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 6 2097152 7 2097152 1 2097152 8 2097152 2 2097152 5 2097152 4 2097152 9 2097152 3 2097152 9 rows selected. Elapsed: 00:00:01.65 17:26:34 ZKM@dev-app73/pdb(9)> select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:03.13
数据量比较大的情况下,“HASH GROUP BY”要更快,当然不能得出“HASH GROUP BY”就一定快的结论。
实际上是因为避免了排序操作所以“HASH GROUP BY”会比”SORT GROUP BY“更快。
无法使用”HASH GROUP BY“的两种情况
情况1:GROUP BY后有对字段进行ORDER BY。
比如:
17:35:32 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id order by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:03.36 17:36:22 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID cns02rbymv6b6, child number 0 ------------------------------------- select id,count(name) from zkm group by id order by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:03.36 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:03.36 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.49 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------- 14 rows selected. Elapsed: 00:00:00.06
解决方法:使用子查询先进行GROUP BY操作,然后再外层查询使用ORDER BY子句进行排序。同时使用/*+ no_merge */防止视图合并。
17:37:19 ZKM@dev-app73/pdb(9)> select * from (select /*+ no_merge */ id,count(name) from zkm group by id) order by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:01.69 17:37:37 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID bxh00bg36g809, child number 0 ------------------------------------- select * from (select /*+ no_merge */ id,count(name) from zkm group by id) order by id Plan hash value: 970191995 ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:01.69 | 28731 | | | | | 1 | SORT ORDER BY | | 1 | 9 | 9 |00:00:01.69 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | VIEW | | 1 | 9 | 9 |00:00:01.69 | 28731 | | | | | 3 | HASH GROUP BY | | 1 | 9 | 9 |00:00:01.69 | 28731 | 1558K| 1558K| 1065K (0)| | 4 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.48 | 28731 | | | | ------------------------------------------------------------------------------------------------------------------ 17 rows selected. Elapsed: 00:00:00.06
明显改写后的SQL执行速度更快。
原因是虽然还是有排序动作但是排序的结果集更更更更小了,从A-Rows看是9行,而不改写之前是对全部的行排序。
情况2:在聚合函数中多次使用distinct处理不同字段。
如SQL:select id,count(distinct name),count(distinct id) from zkm group by id order by id;
09:01:40 ZKM@dev-app73/pdb(9)> select id,count(distinct name),count(distinct id) from zkm group by id order by id; ID COUNT(DISTINCTNAME) COUNT(DISTINCTID) ---------- ------------------- ----------------- 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 1 1 7 1 1 8 1 1 9 1 1 9 rows selected. Elapsed: 00:00:14.67 09:01:56 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 7ht3gbdz1z5ts, child number 0 ------------------------------------- select id,count(distinct name),count(distinct id) from zkm group by id order by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:14.66 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 1 | 9 |00:00:14.66 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:01.45 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------- 15 rows selected. Elapsed: 00:00:00.06
可以看出,多个聚合函数中均使用了distinct导致无法用"HASH GROUP BY",因为两个distinct需要去重,从结果看,对同一结果集可以同时排序两个以上不同的字段后做去重然后count,却无法同时对同一结果集做HASH去重去避免排序。
去掉其中一个distinct的话就没问题,如:select id,count(distinct name),count(id) from zkm group by id order by id;
09:13:56 ZKM@dev-app73/pdb(9)> select id,count(distinct name),count(id) from zkm group by id order by id; ID COUNT(DISTINCTNAME) COUNT(ID) ---------- ------------------- ---------- 1 1 2097152 2 1 2097152 3 1 2097152 4 1 2097152 5 1 2097152 6 1 2097152 7 1 2097152 8 1 2097152 9 1 2097152 9 rows selected. Elapsed: 00:00:02.08 09:14:02 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------- SQL_ID 9t4u0dtgn1q0q, child number 0 ------------------------------------- select id,count(distinct name),count(id) from zkm group by id order by id Plan hash value: 1511739550 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:02.08 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:02.08 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | VIEW | VW_DAG_0 | 1 | 9 | 9 |00:00:02.08 | 28731 | | | | | 3 | HASH GROUP BY | | 1 | 9 | 9 |00:00:02.08 | 28731 | 1452K| 1452K| 1192K (0)| | 4 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.44 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------------- 17 rows selected. Elapsed: 00:00:00.07
解决办法:暂无。
参考文档
GROUP BY Clause Does Not Guarantee a Sort Without ORDER BY Clause in 10g and Above (文档 ID 345048.1)