利用递归WITH子查询进行优化的实例
递归WITH子查询我已经用过不少了,不过都是当作玩具,没有在实践中用过。昨天碰到了一个实用例子。
在一个OLTP中有一张表,主键是随着创建时间递增的。每天产生大约1万条记录,全表大约有几百万,没有分区,创建时间没有索引。
现在要求取出最近两天的数据并且从中过滤出部分记录。过滤条件上也没有索引。
原来的查询为全表扫描,效率太低,有没有办法改善?前提:因为原表索引够多了,不能新增索引;原查询来自JAVA程序,修改之后必须还是一个SQL, 交由JAVA程序执行。
测试数据:
CREATE TABLE items AS
SELECT 3000000 - LEVEL AS item_id
,SYSDATE-LEVEL/10000 AS created_date
,TRUNC(DBMS_RANDOM.VALUE(1,101)) AS item_type
,SYS_GUID() AS DESCRIPTION
FROM DUAL
CONNECT BY LEVEL<=1000000;
ALTER TABLE items ADD CONSTRAINT items_pk PRIMARY KEY (item_id) USING INDEX;
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'ITEMS');
原表要复杂得多,现在简化为100万的一张表。查询大约相当于:
SELECT * FROM items WHERE created_date >= TRUNC(SYSDATE)-2 AND item_type=14;
294 rows selected.
Elapsed: 00:00:00.90
Execution Plan
----------------------------------------------------------
Plan hash value: 446380563
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 258 | 8772 | 1501 (1)| 00:00:19 |
|* 1 | TABLE ACCESS FULL| ITEMS | 258 | 8772 | 1501 (1)| 00:00:19 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ITEM_TYPE"=14 AND "CREATED_DATE">=TRUNC(SYSDATE@!)-2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5429 consistent gets
5406 physical reads
0 redo size
13636 bytes sent via SQL*Net to client
625 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
294 rows processed
例子表简化了很多, 所以只花了0.9秒,5429个一致读。实际表要胖得多,查询需要几十秒。
改写:
因为主键是随着时间递增的,所以“取最近的数据”可转换为:取ID最大的N条数据,如果其中最早的一条还是两天之内就增大N, 直到日期落在两天之外停止查询。
WITH t (item_id,cnt) AS (
SELECT max(item_id),1 FROM items ---先取最近的
UNION ALL
select (SELECT MIN(item_id) FROM (SELECT item_id FROM items ORDER BY item_id DESC) WHERE ROWNUM<=t.cnt+2000) ---- 跳跃取2000行之后的ID
,cnt+2000 ----- 当日期还在区间内则递增取ID的范围。根据每天的数据两选取合适的步长,这里定为2000
FROM t
WHERE (SELECT MAX(created_date) FROM items WHERE item_id=t.item_id)>=TRUNC(SYSDATE)-2) ---- 当取到的ID落在区间外则停止递归
CYCLE item_id SET cycle_flag TO 'Y' DEFAULT 'N' ---- 虽然ID都不重复但是ORACLE会报告有循环数据,所以在这里加上CYCLE语句
select *
FROM items
WHERE item_id>=(SELECT min(item_id) from t) ----- 利用前面的搜索结果
AND item_type=14
AND created_date >= TRUNC(SYSDATE)-2;
294 rows selected.
Elapsed: 00:00:00.16
Execution Plan
----------------------------------------------------------
Plan hash value: 1844953721
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 442 | 81 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | ITEMS | 13 | 442 | 72 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | ITEMS_PK | 9000 | | 23 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 13 | | |
| 4 | VIEW | | 2 | 26 | 9 (0)| 00:00:01 |
| 5 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | |
| 6 | SORT AGGREGATE | | 1 | 6 | | |
| 7 | INDEX FULL SCAN (MIN/MAX) | ITEMS_PK | 1 | 6 | 3 (0)| 00:00:01 |
| 8 | SORT AGGREGATE | | 1 | 13 | | |
|* 9 | COUNT STOPKEY | | | | | |
| 10 | VIEW | | 1000K| 12M| 2238 (1)| 00:00:27 |
| 11 | INDEX FULL SCAN DESCENDING | ITEMS_PK | 1000K| 5859K| 2238 (1)| 00:00:27 |
|* 12 | FILTER | | | | | |
| 13 | RECURSIVE WITH PUMP | | | | | |
| 14 | SORT AGGREGATE | | 1 | 14 | | |
| 15 | TABLE ACCESS BY INDEX ROWID | ITEMS | 1 | 14 | 3 (0)| 00:00:01 |
|* 16 | INDEX UNIQUE SCAN | ITEMS_PK | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ITEM_TYPE"=14 AND "CREATED_DATE">=TRUNC(SYSDATE@!)-2)
2 - access("ITEM_ID">= (SELECT MIN("ITEM_ID") FROM "T" "T"))
9 - filter(ROWNUM<=:B1+2000)
12 - filter( (SELECT MAX("CREATED_DATE") FROM "ITEMS" "ITEMS" WHERE
"ITEM_ID"=:B1)>=TRUNC(SYSDATE@!)-2)
16 - access("ITEM_ID"=:B1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
722 consistent gets
0 physical reads
0 redo size
13636 bytes sent via SQL*Net to client
625 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
15 sorts (memory)
0 sorts (disk)
294 rows processed
主键索引用得上了,一致读下降为722。