• SQL Techniques – Columns to Rows, Rows to Columns


    Abstract: 行变列,列变行在生成报表的时候经常遇到,行变列叫做"Pivot”, 反之叫做"Unpivot”。 在Oracle11g之前,一般都是通过很多work around来实现,但是Oracle11g中直接支持PIVOT和UNPIVOT的操作。

    关于PIVOT和UNPIVOT用法可以参见这里

    1. Rows to Columns

    Rows to Columns是比较常见的操作,比如说下面这个例子,有这样一张表,

     

    SQL> desc dept_emp_distribute;
    Name                                      Null?    Type
    ----------------------------------------- -------- ----------------------------
    DEPT                                                  NUMBER
    REGION                                              VARCHAR2(20)
    EMPLOYEE_COUNT                             NUMBER

     

    SQL> select * from dept_emp_distribute;

          DEPT REGION               EMPLOYEE_COUNT
    ---------- -------------------- --------------
             1 SZ                               10
             1 SH                               15
             1 HZ                               25
             2 HZ                                5
             2 SZ                               15
             2 SH                               35

    6 rows selected.

    现在假设我们要生成如下这张报表,

          DEPT EMPLOYEE_IN_SZ EMPLOYEE_IN_SH EMPLOYEE_IN_HZ
    ---------- -------------- -------------- --------------
             1             10             15             25
             2             15             35              5

    在11g之前,最容易想到的方法,就是采用case when 或 decode来做,如下所示,

    SQL> select dept,
      2    max(case region when 'SZ' then employee_count else null end) employee_in_sz,
      3    max(case region when 'SH' then employee_count else null end) employee_in_sh,
      4    max(case region when 'HZ' then employee_count else null end) employee_in_hz
      5  from
      6    dept_emp_distribute
      7  group by
      8    dept;

     

    11g中可以用PIVOT来写SQL,如下,

    SQL>  select * from
      2  ( select dept, region, employee_count from dept_emp_distribute)
      3  pivot
      4  (
      5    max(employee_count) as employee_count
      6    for region in ('SZ' as SZ, 'SH' as SH, 'HZ' as HZ)
      7  );

          DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
    ---------- ----------------- ----------------- -----------------
             1                10                15                25
             2                15                35                 5

     

    注意Pivot需要一个聚集函数,因此用了max(employee_count),如果去掉这个max(),会得到如下错误:

    ORA-56902: expect aggregate function inside pivot operation

     

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2443834872

    -------------------------------------------------------------------------------------------
    | Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |                     |     6 |   228 |     4  (25)| 00:00:01 |
    |   1 |  HASH GROUP BY PIVOT|                     |     6 |   228 |     4  (25)| 00:00:01 |
    |   2 |   TABLE ACCESS FULL | DEPT_EMP_DISTRIBUTE |     6 |   228 |     3   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------

     

    对应的Pivot操作符的执行操作是HASH GROUP BY PIVOT.

     

     

    这两种方法虽然能满足需求,但是显然不够灵活,因为Hard code了REGION在SQL语句中,如果REGION的信息不确定怎么办呢?

    对于第一种方法来说,用一条静态的SQL语句是很难办到的,因为需要多少条case when 必须要动态来决定,而Pivot也不支持如下写法…

    SQL> select * from
      2  ( select dept, region, employee_count from dept_emp_distribute)
      3  pivot
      4  ( max(employee_count) as employee_count
      5    for region in ( select distinct region from dept_emp_distribute)
      6  );


      for region in ( select distinct region from dept_emp_distribute)
                      *
    ERROR at line 5:
    ORA-00936: missing expression

     

    虽然如此,但是11g支持Pivot XML来满足这种需求,如下

    SQL> select * from
      2  ( select dept, region, employee_count from dept_emp_distribute)
      3  pivot xml
      4  ( max(employee_count) as employee_count
      5    for region in (select distinct region from dept_emp_distribute)
      6  );

     

          DEPT REGION_XML
    ---------- ----------------------------------------------------------------------------------
             1 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN
             2 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN

     

     

     

    2. Columns to Rows.

     

    列转行叫“Unpivot”,相对于“Pivot”来说稍微有点复杂,而且不是那么直观。现在来把上面得到的结果返回去得到原表的结构,该怎么办呢?

    先来看看11g中的UNPIVOT是怎么用的。

    为方便演示,首先用CTAS(create table as select) 来创建一个中间表staging_table, 如下

    SQL> create table staging_table as
      2  select * from
      3  ( select dept, region, employee_count from dept_emp_distribute)
      4  pivot
      5  ( max(employee_count) as employee_count
      6    for region in ('SZ' as SZ,
      7  'SH' as SH, 'HZ' as HZ)
      8  );

    Table created.

    SQL> desc staging_table;
    Name                                      Null?    Type
    ----------------------------------------- -------- ------------------------
    DEPT                                                             NUMBER
    SZ_EMPLOYEE_COUNT                                  NUMBER
    SH_EMPLOYEE_COUNT                                  NUMBER
    HZ_EMPLOYEE_COUNT                                  NUMBER

    SQL> select * from staging_table;

          DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
    ---------- ----------------- ----------------- -----------------
             1                10                15                25
             2                15                35                 5

    看看如何用UNPIVOT来做,

      1  select * from
      2  (
      3      (   select dept,
      4               sz_employee_count   SZ,
      5               sh_employee_count   SH,
      6               hz_employee_count   HZ
      7          from  staging_table
      8      )
      9     unpivot
    10    (
    11       employee_count
    12       for region in (SZ, SH, HZ)
    13     )
    14* )
    SQL> /

          DEPT RE EMPLOYEE_COUNT
    ---------- -- --------------
             1 SZ             10
             1 SH             15
             1 HZ             25
             2 SZ             15
             2 SH             35
             2 HZ              5

    6 rows selected.

    看看UNPIVOT部分, employee_count 和 region可以随便命名,对应于最后的列名。 for region in (SZ, SH, HZ)中的SZ, SH, HZ 对应于select部分中列的别名。

    unpivot
    (
           employee_count
           for region in (SZ, SH, HZ)
    )

    现在回到11g之前,没有数据库对UNPIVOT的直接支持,该怎么办呢?

    很容易想到的可以用UNION ALL来实现,虽然有点 cumbersome, 不是很灵活,

    SQL> select * from
      2  (
      3     select dept, 'SZ' as region, sz_employee_count as employee_count from staging_table
      4     union all
      5     select dept, 'SH' as region, sh_employee_count as employee_count from staging_table
      6     union all
      7     select dept, 'HZ' as region, hz_employee_count as employee_count from staging_table
      8  )
      9  order by dept;

          DEPT RE EMPLOYEE_COUNT
    ---------- -- --------------
             1 SZ             10
             1 SH             15
             1 HZ             25
             2 SH             35
             2 HZ              5
             2 SZ             15

    6 rows selected.

    另外一种稍微复杂点但是更灵活的做法是借助collection, 如下

    SQL> create type name_value_pair as object
      2  ( name varchar2(10),
      3    value number
      4  );
      5  /

    Type created.

    SQL> create type name_value_varray as
      2  varray(10) of name_value_pair;
      3  /

    Type created.

     

     

    SQL> select dept,
      2   vals.name as region,
      3   vals.value as employee_count
      4  from
      5   staging_table t,
      6   table
      7   ( name_value_array
      8      (name_value_pair('SZ', t.sz_employee_count),
      9       name_value_pair('SH', t.sh_employee_count),
    10       name_value_pair('HZ', t.hz_employee_count))
    11   ) vals
    12  order by
    13    dept, region;

     

          DEPT REGION     EMPLOYEE_COUNT
    ---------- ---------- --------------
             1 HZ                     25
             1 SH                     15
             1 SZ                     10
             2 HZ                      5
             2 SH                     35
             2 SZ                     15

    6 rows selected.

    这里面的亮点部分我觉得在于

    staging_table t,

    table
        ( name_value_array
            (name_value_pair('SZ', t.sz_employee_count),
             name_value_pair('SH', t.sh_employee_count),
            name_value_pair('HZ', t.hz_employee_count))
        ) vals

    对于staging_table每一行(每个dept) 来动态生成一个表(name, value), 然后再取出最后得到的结果。猜测:每个dept跟它相对应的“动态表”做笛卡尔乘积。但是从对应的执行计划可以看出,没有看到笛卡尔乘积,而只是nested loops join。

    执行计划如下,

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 402397393

    -----------------------------------------------------------------------------------------------------------------
    | Id  | Operation                               | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                        |               | 16336 |   861K|       |   279   (1)| 00:00:04 |
    |   1 |  SORT ORDER BY                          |               | 16336 |   861K|  1944K|   279   (1)| 00:00:04 |
    |   2 |   NESTED LOOPS                          |               | 16336 |   861K|       |    59   (0)| 00:00:01 |
    |   3 |    TABLE ACCESS FULL                    | STAGING_TABLE |     2 |   104 |       |     3   (0)| 00:00:01 |
    |   4 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|               |       |       |       |            |          |
    -----------------------------------------------------------------------------------------------------------------

     

    注意操作--- COLLECTION ITERATOR CONSTRUCTOR FETCH

     

  • 相关阅读:
    查询表结构信息 封装为存储过程了
    小代码大BUG,记解决Sqlite3死锁问题
    DICT协议浅解
    重载"=="和"!="运算符
    Path.Combine(string,string)
    [原创]SQL SERVER 2008 函数大全 字符串函数
    Could not find a Direct3D device that has a Direct3D9level driver and supports pixel shader 1.1 or greater.终极解决方法
    (转)SqlServer Management Objects简介,生成建表等SQL语句
    界面设计之颜色搭配
    献给初学者,[winform]中如何设计高效全局的快捷键?[ShortcutKeys]
  • 原文地址:https://www.cnblogs.com/fangwenyu/p/1633316.html
Copyright © 2020-2023  润新知