• Oracle 分析函数


    分析函数官方帮助文档:

    http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm#SQLRF06174

    一、Top-N Queries,row_number()

    参考网址:http://www.oracle-base.com/articles/misc/top-n-queries.php

    1、row_number()

         The ROW_NUMBER analytic function is similar to the ROWNUM pseudocolumn in that it assigns a unique number for each row returned, but like all analytic functions its action can be limited to a specific window of data in the result set and based on the order of data in that window. 

       row_number分析函数的功能类似于rownum伪列的功能,他为返回的每一行数据分配一个独立的顺序号,类似于其他所有的分析函数,他是被限制在一个窗口中的,窗口中的数据集是建立在这个串口的排序字段之上(partition by x order by y),其中partition是一个开窗函数,即确定分组条件.

    下面的SQL语句没有开窗函数partition,默认是全部的结果集

    1 SELECT deptno,
    2        ename,
    3        row_number() over(ORDER BY deptno DESC) AS val_row_number
    4   FROM emp 

     

    (注意!,在row_number()必须指定一个明确的指明一个排序规则,即在over语句中,必须指明一个order子句,当丢书开窗语句,默认是全部语句)

    结果:

    例如:

    1 SELECT deptno,
    2        ename,
    3        row_number() over(PARTITION BY deptno ORDER BY ename DESC) AS val_row_number
    4   FROM emp

     上述结果表明是在一个窗口中求他的顺序号,即在组内排序。

    下面是row_number的常见用法:求某个节点遍历到根节点的二级节点的记录,如下:

    1 SELECT lpad(' ', LEVEL * 4) || ename,
    2        LEVEL,
    3        e.empno
    4   FROM emp e
    5  START WITH e.empno = 7369
    6 CONNECT BY e.empno = PRIOR e.mgr
    7  ORDER SIBLINGS BY ename

    则现在需要的记录是JONES这行数据:下面是两种处理方式,

    1、使用分析函数

     1 SELECT *
     2   FROM (SELECT lpad(' ', LEVEL * 4) || ename,
     3                LEVEL,
     4                e.empno,
     5                ename,
     6                row_number() over(ORDER BY LEVEL DESC) rn
     7           FROM emp e
     8          START WITH e.empno = 7369
     9         CONNECT BY e.empno = PRIOR e.mgr) cols
    10  WHERE cols.rn = 2

     

    2、使用普通的ROWNUM(需要多层嵌套,很不方便)

     1 SELECT *
     2   FROM (SELECT cols.*,
     3                rownum rn
     4           FROM (SELECT lpad(' ', LEVEL * 4) || ename,
     5                        LEVEL,
     6                        e.empno,
     7                        ename
     8                 
     9                   FROM emp e
    10                  START WITH e.empno = 7369
    11                 CONNECT BY e.empno = PRIOR e.mgr
    12                  ORDER BY LEVEL DESC) cols)
    13  WHERE rn = 2

     

    二、RANK, DENSE_RANK, FIRST and LAST Analytic Function

    参考网址:http://www.oracle-base.com/articles/misc/rank-dense-rank-first-last-analytic-functions.php

    1、RANK

    当我们需要根据某个条件去获取排序号,或者排名的时候,我们可以使用RANK:

    1 SELECT empno,
    2        deptno,
    3        sal,
    4        rank() over(PARTITION BY deptno ORDER BY sal) "rank"
    5   FROM emp;

    结果:

    结果分析:对于薪水相同的,它们的排序号一样,但是下一个,发生了跨越(不连续的),变为了4,而DENSE_RANK()则不会进行跨越)

    2、DENSE_RANK

    1 SELECT empno,
    2        deptno,
    3        sal,
    4        DENSE_rank() over(PARTITION BY deptno ORDER BY sal) "rank"
    5   FROM emp;

    结果:

    结果分析:对于薪水相同的,它们的排序号一样,下一个,也没有发生跨越,是连续的

    3、FIRST AND LAST

        以下是它的官方解释:

         The FIRST/LAST aggregate functions allow you to return the result of an aggregate applied over a set of rows that rank as the first or last with respect to a given order specificationFIRST/LAST lets you order on column A but return an result of an aggregate applied on column B. This is valuable because it avoids the need for a self-join or subquery, thus improving performance. These functions begin with a tiebreaker function, which is a regular aggregate function (MINMAXSUMAVGCOUNTVARIANCESTDDEV) that produces the return value. The tiebreaker(决胜的,最终的) function is performed on the set rows (1 or more rows) that rank as first or last respect to the order specification to return a single value.

        使用FIRST和LAST会根据Order的排序字段返回相同排序的一个结果集(假如使用部门排序,遇到两个人在一个部门,那么它会返回两条记录,根据前面的聚合函数-min,max,sum,avg等,来把这两条记录最终聚合成一条记录)

         To specify the ordering used within each group, the FIRST/LAST functions add a new clause starting with the word KEEP.

     

    1 SELECT empno,
    2        deptno,
    3        sal,
    4        MIN(sal) keep(dense_rank FIRST ORDER BY deptno) over(PARTITION BY deptno) "Lowest",
    5        MAX(sal) keep(dense_rank FIRST ORDER BY deptno) over(PARTITION BY deptno) "Highest"
    6   FROM emp
    7  ORDER BY deptno,
    8           sal;

    结果:

     

         结果分析:首先是根据部门进行分组,根据部门字段进行排序(无意义的排序-为了测试),那么根据FIST和LST都会取出相同的结果集,根据MAX和MIN相当于取根据部门分组后的,根据部门排序后的最大薪水和最小薪水。

    注意:使用FIRST和LAST获取出的是结果集,而不是单条记录,然后,根据聚合函数求在结果集中的结果。

    三:ROLLUP,CUBE,GROUPING,GROUPING_ID,GROUPING SETS的用法

    参考网址:http://www.oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.php

                  http://www.oracle-dba-online.com/sql/group_by_sub_queries.htm

                 之前在博客中已经存在了关于CUBE,ROLLUP等的使用的文章,请参考:

                 http://www.cnblogs.com/caroline/archive/2011/12/10/2283353.html

    ROLLUP和CUDE的用法:

           ROLLUP:

       The ROLLUP operation in the simple_grouping_clause groups the selected rows based on the values of the first n, n-1, n-2, ... 0 expressions in the GROUP BY specification(从右边向左的集合,对于N个ROLLUP列,他会产生n+1种分组条件), and returns a single row of summary for each group(为每个组返回单独的行). You can use the ROLLUP operation to produce subtotal values by using it with the SUM function. When used with SUMROLLUP generates subtotals from the most detailed level to the grand total. Aggregate functions such as COUNT can be used to produce other kinds of superaggregates.

            For example, given three expressions (n=3) in the ROLLUP clause of the simple_grouping_clause, the operation results in n+1 = 3+1 = 4 groupings.

           假定在ROLLUP中有三列,那么它会产生四种分组条件,即四种结果集:

          如:rollup(deptno,job,ename)则会产生的四种分组条件依次为:

             deptno,job,ename|deptno,job|deptno|全部

             Rows grouped on the values of the first 'n' expressions are called regular rows, and the others are called superaggregate rows.

            只有在第一个分组的条件下(即rollup中的条件全部的时候)的值叫做正常行,其他的都为聚合行。

            CUBE:

            The CUBE operation in the simple_grouping_clause groups the selected rows based on the values of all possible combinations of expressions in the specification, and returns a single row of summary information for each group. You can use the CUBE operation to produce cross-tabulation values.    

             For example, given three expressions (n=3) in the CUBE clause of the simple_grouping_clause, the operation results in 2n = 23 = 8 groupings. Rows grouped on the values of 'n' expressions are  called    regular  rows, and the rest are called superaggregate rows.

    假定在CUBE中有三列,那么它会产生四种分组条件,即四种结果集:

          如:CUBE(deptno,job,ename)则会产生的八种分组条件:

                deptno,job,ename|deptno,job|deptno,ename|deptno,job|job,ename|deptno|job|ename

               000|001|010|011|100|101|110|111八种组合条件,其中0代表该分组条件中该列不存在

               只有在第一个分组的条件下(全部条件生效)(即cube中的条件全部的时候)的值叫做正常行,其他的都为聚合行。

    GROUPING()

             It can be quite easy to visually identify subtotals generated by rollups and cubes, but to do it programatically you really need something more accurate than the presence of null values in the grouping columns. This is where the GROUPING function comes in. It accepts a single column as a parameter and returns "1" if the column contains a null value generated as part of a subtotal by a ROLLUP or CUBE operation or "0" for any other value, including stored null values.

            可以简单的从外表来看结果集是rollup还是cube的结果,但是在程序中,可能需要更加精确的来判断而不是仅仅通过列中的null值来判断。这就是grouping函数出现的原因,它接受一个列作为输入参数,如果这个列的值为null是作为rollup或者cube的结果,那么将返回1,否则返回0,包括空值。

    1 SELECT deptno,
    2        d.job,
    3        SUM(d.sal),
    4        GROUPING(deptno) f1g,
    5        GROUPING(job) f2g
    6   FROM emp d
    7  GROUP BY ROLLUP(deptno, d.job);

    1 SELECT deptno,
    2        d.job,
    3        SUM(d.sal),
    4        GROUPING(deptno) f1g,
    5        GROUPING(job) f2g
    6   FROM emp d
    7  GROUP BY CUBE(deptno, d.job)

     从上述的查询结果中我们可以得到如下的结论:

    • F1G=0,F2G=0 : Represents a row containing regular subtotal we would expect from a GROUP BY operation.(是正常行,是通过普通的group by得到的
    • F1G=0,F2G=1 : Represents a row containing a subtotal for a distinct value of the FACT_1_ID column, as generated by ROLLUP and CUBE operations.(代表不重复的一个deptno列的值,是通过rollup或者cube生成的)
    • F1G=1,F2G=0 : Represents a row containing a subtotal for a distinct value of the FACT_2_ID column, which we would only see in a CUBE operation.(代表不重复的一个job列的值,仅仅是通过cube生成的,因为只有通过cube才会聚合形成这种结果,而rollup不会单独的对job这一列进行聚合)
    • F1G=1,F2G=1 : Represents a row containing a grand total for the query, as generated by ROLLUP and CUBE operations.(针对全部的结果集聚合的结果,是rollup或者cube生成的)

     以下介绍下grouping的用途:

    1、使用grouping进行排序:

    1 SELECT deptno,
    2        d.job,
    3        SUM(d.sal),
    4        GROUPING(deptno) f1g,
    5        GROUPING(job) f2g
    6   FROM emp d
    7  GROUP BY ROLLUP(deptno, d.job)
    8  ORDER BY GROUPING(deptno),
    9           GROUPING(job)

     2、进行过滤数据:

     1 SELECT deptno,
     2        d.job,
     3        SUM(d.sal),
     4        GROUPING(deptno) f1g,
     5        GROUPING(job) f2g
     6   FROM emp d
     7  GROUP BY ROLLUP(deptno, d.job)
     8 HAVING GROUPING(deptno) = 1 OR GROUPING(job) = 1
     9  ORDER BY GROUPING(deptno),
    10           GROUPING(job)

    GROUPING_ID()

    The GROUPING_ID function provides an alternate and more compact way to identify subtotal rows. Passing the dimension columns as arguments, it returns a number indicating the GROUP BY level

     grouping_id指出了一个可选的和更加紧凑的去辨别结果集的方法,传递分组的列作为参数,它返回行所在的聚合层级的值

    如下所示:

     1 SELECT deptno,
     2        d.job,
     3        SUM(d.sal),
     4        GROUPING(deptno) f1g,
     5        GROUPING(job) f2g,
     6        grouping_id(job) x,
     7        grouping_id(deptno) y,
     8        grouping_id(deptno, d.job) z
     9   FROM emp d
    10  GROUP BY CUBE(deptno, d.job)
    11  ORDER BY GROUPING(deptno),
    12           GROUPING(job)

     GROUPING SETS()

         If we only need a few of these levels of subtotaling we can use the GROUPING SETS expression and specify exactly which ones we need, saving us having to calculate the whole cube.

        如果我们只需要一部分level的结果集,我们可以使用grouping set表达式来指定我们需要的聚合结果集,

       例如,上面的例子CUBE(deptno,job,ename)总计共有八种情况,则这时候,我们可以通过grouping sets指定需要聚合的结果集:

      如下所示:

     1 SELECT deptno,
     2        d.job,
     3        SUM(d.sal),
     4        GROUPING(deptno) f1g,
     5        GROUPING(job) f2g,
     6        grouping_id(deptno, job) x
     7   FROM emp d
     8  GROUP BY GROUPING SETS(deptno, job)
     9  ORDER BY GROUPING(deptno),
    10           GROUPING(job)

     组合列的统计结果分析:

    1、rollup

    ROLLUP (a, b, c)
    (a, b, c)
    (a, b)
    (a)
    ()

    2、cube

    1 CUBE (a, b, c)
    2 (a, b, c)
    3 (a, b)
    4 (a, c)
    5 (a)
    6 (b, c)
    7 (b)
    8 (c)
    9 ()

    3、rollup中的组合

    1 ROLLUP ((a, b), c)
    2 (a, b, c)
    3 (a, b)
    4 ()
    5 
    6 Not considered:
    7 (a)

    4、cube中的组合

     1 CUBE ((a, b), c)
     2 (a, b, c)
     3 (a, b)
     4 (c)
     5 ()
     6 
     7 Not considered:
     8 (a, c)
     9 (a)
    10 (b, c)
    11 (b)

    5、grouping set的组合条件

    GROUPING SETS(fact_1_id, fact_2_id) 
    : (fact_1_id) (fact_2_id)
    GROUPING SETS(fact_3_id, fact_4_id)
    : (fact_3_id) (fact_4_id)
    GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id)
    : (fact_1_id, fact_3_id) (fact_1_id, fact_4_id) (fact_2_id, fact_3_id) (fact_2_id, fact_4_id)

    --grouping sets的组合是在两个条件直接进行查询交叉条件

    也就是:

    GROUPING SETS(a, b), GROUPING SETS(c, d) 
    : (a, c) (a, d) (b, c) (b, d)

    三、LAG and LEAD

    LEAD和LAG主要的功能是进行销售统计的时候,需要求出上月的销量或者下月的销量

    参考网址:http://www.oracle-base.com/articles/misc/lag-lead-analytic-functions.php

    语法:

    1 LAG  (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)
    2 LEAD (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)
    • value_expression - Can be a column or a built-in function, except for other analytic functions.
    • offset - The number of rows preceeding/following the current row, from which the data is to be retrieved. The default value is 1.
      在分区窗口内的迁移量,默认是1,
    • default - The value returned if the offset is outside the scope of the window. The default value is NULL.当当前的迁移量超出了该窗口,则返回改制。
    1 SELECT e.ename,
    2        empno,
    3        deptno,
    4        sal,
    5        lag(sal, 1, 0) over(ORDER BY sal) lag1,
    6        lag(sal, 1) over(ORDER BY sal) lag2,
    7        lead(sal) over(ORDER BY sal) lead1
    8   FROM emp e

     上述的SQL语句中,LAG1列向上迁移一列,默认为1,其中0表示当迁移量超出了窗口中记录的范围时,以0为准。

     4、分析函数和聚合函数结合使用的例子:

    (1)、注意下面的SQL语句没有使用ORDER子句,那么他的结果不是连续求值的

    1 SELECT deptno,
    2        e.ename,
    3        empno,
    4        sal,
    5        AVG(sal) over(PARTITION BY deptno) 部门平均工资,
    6        SUM(sal) over(PARTITION BY deptno) 部门工资之和,
    7        MAX(sal) over(PARTITION BY deptno) 部门最高工资,
    8        COUNT(1) over(PARTITION BY deptno) 部门人数
    9   FROM emp e

    下面的SQL语句是追加了ORDER BY的子句,表示连续的求聚合值:

     

    1 SELECT deptno,
    2        e.ename,
    3        empno,
    4        sal,
    5        AVG(sal) over(PARTITION BY deptno ORDER BY ename) 部门平均工资,
    6        SUM(sal) over(PARTITION BY deptno ORDER BY ename) 部门工资之和,
    7        MAX(sal) over(PARTITION BY deptno ORDER BY ename) 部门最高工资
    8   FROM emp e

     

     

    --待续

  • 相关阅读:
    前端学习笔记
    C#学习笔记
    angualr跨域访问配置
    一文把samba相关的都说清楚
    centos6.5上进行crontab操作
    VS2017 EF本地数据库链接
    Laravel服务/服务提供者/容器/契约和门面
    Laravel之目录结构
    Laravel之视图和Blade模板引擎
    Laravel之HTTP相应
  • 原文地址:https://www.cnblogs.com/caroline/p/2770886.html
Copyright © 2020-2023  润新知