Abstract: 行变列,列变行在生成报表的时候经常遇到,行变列叫做"Pivot”, 反之叫做"Unpivot”。 在Oracle11g之前,一般都是通过很多work around来实现,但是Oracle11g中直接支持PIVOT和UNPIVOT的操作。
关于PIVOT和UNPIVOT用法可以参见这里
1. Rows to Columns
Rows to Columns是比较常见的操作,比如说下面这个例子,有这样一张表,
SQL> desc dept_emp_distribute;
Name Null? Type
----------------------------------------- -------- ----------------------------
DEPT NUMBER
REGION VARCHAR2(20)
EMPLOYEE_COUNT NUMBER
SQL> select * from dept_emp_distribute;
DEPT REGION EMPLOYEE_COUNT
---------- -------------------- --------------
1 SZ 10
1 SH 15
1 HZ 25
2 HZ 5
2 SZ 15
2 SH 35
6 rows selected.
现在假设我们要生成如下这张报表,
DEPT EMPLOYEE_IN_SZ EMPLOYEE_IN_SH EMPLOYEE_IN_HZ
---------- -------------- -------------- --------------
1 10 15 25
2 15 35 5
在11g之前,最容易想到的方法,就是采用case when 或 decode来做,如下所示,
SQL> select dept,
2 max(case region when 'SZ' then employee_count else null end) employee_in_sz,
3 max(case region when 'SH' then employee_count else null end) employee_in_sh,
4 max(case region when 'HZ' then employee_count else null end) employee_in_hz
5 from
6 dept_emp_distribute
7 group by
8 dept;
11g中可以用PIVOT来写SQL,如下,
SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot
4 (
5 max(employee_count) as employee_count
6 for region in ('SZ' as SZ, 'SH' as SH, 'HZ' as HZ)
7 );
DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
---------- ----------------- ----------------- -----------------
1 10 15 25
2 15 35 5
注意Pivot需要一个聚集函数,因此用了max(employee_count),如果去掉这个max(),会得到如下错误:
ORA-56902: expect aggregate function inside pivot operation
Execution Plan
----------------------------------------------------------
Plan hash value: 2443834872
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 228 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY PIVOT| | 6 | 228 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | DEPT_EMP_DISTRIBUTE | 6 | 228 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
对应的Pivot操作符的执行操作是HASH GROUP BY PIVOT.
这两种方法虽然能满足需求,但是显然不够灵活,因为Hard code了REGION在SQL语句中,如果REGION的信息不确定怎么办呢?
对于第一种方法来说,用一条静态的SQL语句是很难办到的,因为需要多少条case when 必须要动态来决定,而Pivot也不支持如下写法…
SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot
4 ( max(employee_count) as employee_count
5 for region in ( select distinct region from dept_emp_distribute)
6 );
for region in ( select distinct region from dept_emp_distribute)
*
ERROR at line 5:
ORA-00936: missing expression
虽然如此,但是11g支持Pivot XML来满足这种需求,如下
SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot xml
4 ( max(employee_count) as employee_count
5 for region in (select distinct region from dept_emp_distribute)
6 );
DEPT REGION_XML
---------- ----------------------------------------------------------------------------------
1 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN
2 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN
2. Columns to Rows.
列转行叫“Unpivot”,相对于“Pivot”来说稍微有点复杂,而且不是那么直观。现在来把上面得到的结果返回去得到原表的结构,该怎么办呢?
先来看看11g中的UNPIVOT是怎么用的。
为方便演示,首先用CTAS(create table as select) 来创建一个中间表staging_table, 如下
SQL> create table staging_table as
2 select * from
3 ( select dept, region, employee_count from dept_emp_distribute)
4 pivot
5 ( max(employee_count) as employee_count
6 for region in ('SZ' as SZ,
7 'SH' as SH, 'HZ' as HZ)
8 );
Table created.
SQL> desc staging_table;
Name Null? Type
----------------------------------------- -------- ------------------------
DEPT NUMBER
SZ_EMPLOYEE_COUNT NUMBER
SH_EMPLOYEE_COUNT NUMBER
HZ_EMPLOYEE_COUNT NUMBER
SQL> select * from staging_table;
DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
---------- ----------------- ----------------- -----------------
1 10 15 25
2 15 35 5
看看如何用UNPIVOT来做,
1 select * from
2 (
3 ( select dept,
4 sz_employee_count SZ,
5 sh_employee_count SH,
6 hz_employee_count HZ
7 from staging_table
8 )
9 unpivot
10 (
11 employee_count
12 for region in (SZ, SH, HZ)
13 )
14* )
SQL> /
DEPT RE EMPLOYEE_COUNT
---------- -- --------------
1 SZ 10
1 SH 15
1 HZ 25
2 SZ 15
2 SH 35
2 HZ 5
6 rows selected.
看看UNPIVOT部分, employee_count 和 region可以随便命名,对应于最后的列名。 for region in (SZ, SH, HZ)中的SZ, SH, HZ 对应于select部分中列的别名。
unpivot
(
employee_count
for region in (SZ, SH, HZ)
)
现在回到11g之前,没有数据库对UNPIVOT的直接支持,该怎么办呢?
很容易想到的可以用UNION ALL来实现,虽然有点 cumbersome, 不是很灵活,
SQL> select * from
2 (
3 select dept, 'SZ' as region, sz_employee_count as employee_count from staging_table
4 union all
5 select dept, 'SH' as region, sh_employee_count as employee_count from staging_table
6 union all
7 select dept, 'HZ' as region, hz_employee_count as employee_count from staging_table
8 )
9 order by dept;
DEPT RE EMPLOYEE_COUNT
---------- -- --------------
1 SZ 10
1 SH 15
1 HZ 25
2 SH 35
2 HZ 5
2 SZ 15
6 rows selected.
另外一种稍微复杂点但是更灵活的做法是借助collection, 如下
SQL> create type name_value_pair as object
2 ( name varchar2(10),
3 value number
4 );
5 /
Type created.
SQL> create type name_value_varray as
2 varray(10) of name_value_pair;
3 /
Type created.
SQL> select dept,
2 vals.name as region,
3 vals.value as employee_count
4 from
5 staging_table t,
6 table
7 ( name_value_array
8 (name_value_pair('SZ', t.sz_employee_count),
9 name_value_pair('SH', t.sh_employee_count),
10 name_value_pair('HZ', t.hz_employee_count))
11 ) vals
12 order by
13 dept, region;
DEPT REGION EMPLOYEE_COUNT
---------- ---------- --------------
1 HZ 25
1 SH 15
1 SZ 10
2 HZ 5
2 SH 35
2 SZ 15
6 rows selected.
这里面的亮点部分我觉得在于
staging_table t,
table
( name_value_array
(name_value_pair('SZ', t.sz_employee_count),
name_value_pair('SH', t.sh_employee_count),
name_value_pair('HZ', t.hz_employee_count))
) vals
对于staging_table每一行(每个dept) 来动态生成一个表(name, value), 然后再取出最后得到的结果。猜测:每个dept跟它相对应的“动态表”做笛卡尔乘积。但是从对应的执行计划可以看出,没有看到笛卡尔乘积,而只是nested loops join。
执行计划如下,
Execution Plan
----------------------------------------------------------
Plan hash value: 402397393
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16336 | 861K| | 279 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 16336 | 861K| 1944K| 279 (1)| 00:00:04 |
| 2 | NESTED LOOPS | | 16336 | 861K| | 59 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | STAGING_TABLE | 2 | 104 | | 3 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | | | | | |
-----------------------------------------------------------------------------------------------------------------
注意操作--- COLLECTION ITERATOR CONSTRUCTOR FETCH