Partition Outer Join是Oracle 10g新增加的,用来方便“补足”在outer join的时候“漏掉”(NULL)的数据。
当两个用来做Join的表中的数据不“相当”的时候,如果我们希望在最终结果中其中一张表的所有records都不落下,一般会用到outer join。另外一个表中“缺失”的那部分数据自然都是以NULL来补足。但是如果我们希望缺失的那部分数据(NULL)能正常"补足'’ 出来,这个时候用Partiton Outer Join就会很方便,而且性能很好。
举个例子,来源这里
首先建立一个测试表customer_orders,
SQL>
1 CREATE TABLE customer_orders(name, dt, amt)
2 AS
3 SELECT *
4 FROM (
5 SELECT owner,
6 TRUNC(created) + MOD(ROWNUM, 6),
7 TRUNC(object_id/ROWNUM)
8 FROM all_objects
9 WHERE owner IN ('ORDSYS', 'WKSYS')
10 ORDER BY
11 DBMS_RANDOM.RANDOM
12 )
13* WHERE ROWNUM <= 1000
SQL> /
Table created.
生成一个报表,显示每个customer在2007年每个月的花销.
SQL> SELECT name,
2 TO_CHAR(dt, 'YYYYMM') as year_month,
3 SUM(amt) as total_amt
4 FROM customer_orders
5 GROUP BY
6 name,
7 TO_CHAR(dt, 'YYYYMM')
8 ORDER BY 2, 1;
NAME YEAR_M TOTAL_AMT
------------------------------ ------ ----------
ORDSYS 200710 224973
WKSYS 200710 3830
SQL>
这条简单的SQL只会返回每个customer实际的花销,并不能把07年其他月份的数据显示出来(尽管其他月份都没有任何花销)。
那么如何把其他月份的数据(0)也都显示出来呢,最容易想到做法就是用外连接!
SQL>
1 WITH year_month AS
2 ( SELECT TO_CHAR(
3 ADD_MONTHS(
4 DATE '2007-01-01', ROWNUM-1), 'YYYYMM') as year_month
5 FROM dual
6 CONNECT BY ROWNUM <= 12
7 )
8 SELECT co.name,
9 ym.year_month,
10 NVL(SUM(co.amt), 0) AS total_amount
11 FROM year_month ym
12 LEFT OUTER JOIN customer_orders co
13 ON (TO_CHAR(co.dt, 'YYYYMM') = ym.year_month)
14 GROUP BY
15 co.name,
16 ym.year_month
17 ORDER BY
18 co.name,
19* ym.year_month
NAME YEAR_M TOTAL_AMOUNT
------------------------------ ------ ------------
ORDSYS 200710 224973
WKSYS 200710 3830
200701 0
200702 0
200703 0
200704 0
200705 0
200706 0
200707 0
200708 0
200709 0
200711 0200712 0
12 rows selected.
SQL>
虽然把12个月份都显示出来了,但是customer的名字除了10月份之外都没有显示在记录里面,这个显然不是那么友好。
那么该如何让customer的名字也都列出来呢,先来看看不用partiton outer join如何来实现,
SQL>
1 WITH year_months AS
2 ( SELECT TO_CHAR(
3 ADD_MONTHS(
4 DATE '2007-01-01', ROWNUM-1), 'YYYYMM') as year_month
5 FROM dual
6 CONNECT BY ROWNUM <= 12
7 ),
8 customer_names AS
9 (
10 SELECT DISTINCT name
11 FROM custormer_orders
12 )
13 SELECT sq.name,
14 sq.year_month,
15 NVL(co.total_amount, 0) AS total_amount
16 FROM
17 (
18 SELECT name,
19 TO_CHAR(dt, 'YYYYMM') as year_month,
20 SUM(amt) as total_amount
21 FROM custormer_orders
22 GROUP BY
23 name,
24 TO_CHAR(dt, 'YYYYMM')
25 )co,
26 (
27 SELECT *
28 FROM customer_names,
29 year_months
30 ) sq
31 WHERE
32 sq.year_month = co.year_month(+)
33 AND sq.name = co.name(+)
34 ORDER BY
35 sq.name,
36* sq.year_month
NAME YEAR_M TOTAL_AMOUNT
------------------------------ ------ ------------
ORDSYS 200701 0
ORDSYS 200702 0
ORDSYS 200703 0
ORDSYS 200704 0
ORDSYS 200705 0
ORDSYS 200706 0
ORDSYS 200707 0
ORDSYS 200708 0
ORDSYS 200709 0
ORDSYS 200710 224973
ORDSYS 200711 0
ORDSYS 200712 0
WKSYS 200701 0
WKSYS 200702 0
WKSYS 200703 0
WKSYS 200704 0
WKSYS 200705 0
WKSYS 200706 0
WKSYS 200707 0
WKSYS 200708 0
WKSYS 200709 0
WKSYS 200710 3830
WKSYS 200711 0
WKSYS 200712 0
24 rows selected.
SQL>
这种方式用了两个临时表,然后这两个临时表进行了笛卡尔乘积,然后再同custormer_orders表做外连接。 注意,为了得到所有的customer,customer_orders表被访问了两次,显然会对性能造成影响。 通过查看dbms_xplan.display_cursor来看看这条SQL的执行计划及代价,
SQL> SELECT plan_table_output
2 FROM TABLE(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2tn2b3sdyhxn4, child number 0
-------------------------------------
WITH year_months AS ( SELECT TO_CHAR(
ADD_MONTHS( DATE '2007-01-01', ROWNUM-1),
'YYYYMM') as year_month FROM dual CONNECT BY ROWNUM <= 12
), customer_names AS ( SELECT DISTINCT name
FROM custormer_orders ) SELECT sq.name, sq.year_month,
NVL(co.total_amount, 0) AS total_amount FROM (
SELECT name, TO_CHAR(dt, 'YYYYMM') as year_month,
SUM(amt) as total_amount FROM custormer_orders GROUP
BY name, TO_CHAR(dt, 'YYYYMM') )co, (
SELECT * FROM customer_names, year_months ) sq
WHERE sq.year_month = co.year_month(+) AND sq.name =
co.name(+) ORDER BY sq.name, sq.year_month
Plan hash value: 1191063733
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 154 (100)| |
| 1 | SORT ORDER BY | | 10000 | 556K| 1352K| 154 (3)| 00:00:02 |
|* 2 | HASH JOIN OUTER | | 10000 | 556K| | 13 (24)| 00:00:01 |
| 3 | VIEW | | 1000 | 22000 | | 7 (15)| 00:00:01 |
| 4 | MERGE JOIN CARTESIAN | | 1000 | 22000 | | 7 (15)| 00:00:01 |
| 5 | VIEW | | 1 | 5 | | 2 (0)| 00:00:01 |
| 6 | COUNT | | | | | | |
| 7 | CONNECT BY WITHOUT FILTERING| | | | | | |
| 8 | FAST DUAL | | 1 | | | 2 (0)| 00:00:01 |
| 9 | BUFFER SORT | | 1000 | 17000 | | 7 (15)| 00:00:01 |
| 10 | VIEW | | 1000 | 17000 | | 5 (20)| 00:00:01 |
| 11 | HASH UNIQUE | | 1000 | 17000 | | 5 (20)| 00:00:01 |
| 12 | TABLE ACCESS FULL | CUSTORMER_ORDERS | 1000 | 17000 | | 4 (0)| 00:00:01 |
| 13 | VIEW | | 1000 | 35000 | | 5 (20)| 00:00:01 |
| 14 | HASH GROUP BY | | 1000 | 39000 | | 5 (20)| 00:00:01 |
| 15 | TABLE ACCESS FULL | CUSTORMER_ORDERS | 1000 | 39000 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SQ"."YEAR_MONTH"="CO"."YEAR_MONTH" AND "SQ"."NAME"="CO"."NAME")
Note
-----
- dynamic sampling used for this statement
47 rows selected.
SQL>
接下来看看用Partition Outer Join如何实现,
SQL>
1 WITH year_months AS
2 (
3 SELECT to_char
4 ( ADD_MONTHS(
5 DATE'2007-01-01', ROWNUM - 1), 'YYYYMM') AS year_month
6 FROM dual
7 CONNECT BY ROWNUM <= 12
8 )
9 SELECT co.name,
0 ym.year_month,
1 NVL(SUM(co.amt), 0) as total_amount
2 FROM year_months ym
3 LEFT OUTER JOIN
4 custormer_orders co
5 PARTITION BY (co.name)
6 ON (TO_CHAR(co.dt, 'YYYYMM') = ym.year_month)
7 GROUP BY
8 co.name,
9 ym.year_month
0 ORDER BY
1 co.name,
2* ym.year_month
NAME YEAR_M TOTAL_AMOUNT
------------------------------ ------ ------------
ORDSYS 200701 0
ORDSYS 200702 0
ORDSYS 200703 0
ORDSYS 200704 0
ORDSYS 200705 0
ORDSYS 200706 0
ORDSYS 200707 0
ORDSYS 200708 0
ORDSYS 200709 0
ORDSYS 200710 224973
ORDSYS 200711 0
ORDSYS 200712 0
WKSYS 200701 0
WKSYS 200702 0
WKSYS 200703 0
WKSYS 200704 0
WKSYS 200705 0
WKSYS 200706 0
WKSYS 200707 0
WKSYS 200708 0
WKSYS 200709 0
WKSYS 200710 3830
WKSYS 200711 0
WKSYS 200712 0
24 rows selected.
SQL>
注意PARTITION OUTER JOIN 的用法,
(xx: table 1) LEFT OUTER JOIN (xxx: table 2) PARTITION BY (xxx: column name) ON (xxx: join condition)
相应的执行计划如下,
SQL> select plan_table_output
2 from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
SQL_ID 4np7tnbh0pqa2, child number 0
-------------------------------------
WITH year_months AS ( SELECT to_char ( ADD_MONTHS(
DATE'2007-01-01', ROWNUM - 1), 'YYYYMM') AS year_month
FROM dual CONNECT BY ROWNUM < 12 ) SELECT co.name,
ym.year_month, NVL(SUM(co.amt), 0) as total_amount FROM
year_months ym LEFT OUTER JOIN custormer_orders co PARTITION BY
(co.name) ON (TO_CHAR(co.dt, 'YYYYMM') = ym.year_month) GROUP BY
co.name, ym.year_month ORDER BY co.name,
ym.year_month
Plan hash value: 4042059898
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 10 (100)| |
| 1 | SORT GROUP BY | | 2 | 138 | 10 (20)| 00:00:01 |
| 2 | VIEW | | 10 | 690 | 9 (12)| 00:00:01 |
| 3 | NESTED LOOPS PARTITION OUTER | | 10 | 230 | 9 (12)| 00:00:01 |
| 4 | BUFFER SORT | | | | | |
| 5 | VIEW | | 1 | 5 | 2 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
| 7 | CONNECT BY WITHOUT FILTERING| | | | | |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 9 | FILTER | | | | | |
| 10 | SORT PARTITION JOIN | | 10 | 180 | 5 (20)| 00:00:01 |
| 11 | TABLE ACCESS FULL | CUSTORMER_ORDERS | 10 | 180 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
9 - filter("YM"."YEAR_MONTH"=TO_CHAR(INTERNAL_FUNCTION("CO"."DT"),'YYYYMM'))
35 rows selected.
SQL>
从执行计划上看,应用Partition Outer Join之后, cost 从154降到了10!性能得到大大提升!