• Hive函数:rank()、dense_rank()


    数据准备:

    G1,KING,5000
    G1,BING,5000
    G2,FING,5000
    G1,FORD,3000
    G2,SCOTT,3000
    G1,JONES,2975
    G2,BLAKE,2850
    G1,CLARK,2450
    G1,ALLEN,1600
    G1,CELL1,NULL
    G2,CELL2,NULL
    
    CREATE EXTERNAL TABLE test_data (
    gid string,
    ename STRING,
    sal INT
    ) ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    stored as textfile location '/user/jc_rc_ftp/test_data';
    
    select * from test_data l;
    +--------+----------+--------+--+
    | l.gid  | l.ename  | l.sal  |
    +--------+----------+--------+--+
    | G1     | CELL1    | NULL   |
    | G2     | CELL2    | NULL   |
    | G1     | KING     | 5000   |
    | G1     | BING     | 5000   |
    | G2     | FING     | 5000   |
    | G1     | FORD     | 3000   |
    | G2     | SCOTT    | 3000   |
    | G1     | JONES    | 2975   |
    | G2     | BLAKE    | 2850   |
    | G1     | CLARK    | 2450   |
    | G1     | ALLEN    | 1600   |
    +--------+----------+--------+--+

    rank

    ---rank()over(order by 列名排序)的结果是不连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果结果如:1 1 1 4

    SQL> select gid,ename,sal,rank() over(order by sal desc) from test_data;
    

    ---rank() over (partition by 分组字段 order by 排序字段 顺序)

    注意:使用rank()over(order by 排序字段 顺序)排序的时候,空值是最大的

    select gid,ename,sal,rank()over(partition by gid order by coalesce(sal,0) desc) from test_data;
    +------+--------+-------+----------------+--+
    | gid | ename | sal | rank_window_0 |
    +------+--------+-------+----------------+--+
    | G1 | BING | 5000 | 1 |
    | G1 | KING | 5000 | 1 |
    | G1 | FORD | 3000 | 3 |
    | G1 | JONES | 2975 | 4 |
    | G1 | CLARK | 2450 | 5 |
    | G1 | ALLEN | 1600 | 6 |
    | G1 | CELL1 | NULL | 7 |
    | G2 | FING | 5000 | 1 |
    | G2 | SCOTT | 3000 | 2 |
    | G2 | BLAKE | 2850 | 3 |
    | G2 | CELL2 | NULL | 4 |
    +------+--------+-------+----------------+--+

    dense_rank

    ---dense_rank()over(order by 列名排序)的结果是连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果如:1 1 1 2
    如果排序字段为null,可能造成在排序时将null字段排在最前面,影响排序的正确性。
    所以建议将 dense_rank()over(order by 列名 排序)改为dense_rank()over(order by 列名排序 nulls last)

    ---rank()over(order by 列名排序)的结果是不连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果结果如:1 1 1 4
    SQL> select gid,ename,sal,rank() over(order by sal desc) from test_data;
    +------+--------+-------+----------------+--+
    | gid  | ename  |  sal  | rank_window_0  |
    +------+--------+-------+----------------+--+
    | G2   | FING   | 5000  | 1              |
    | G1   | BING   | 5000  | 1              |
    | G1   | KING   | 5000  | 1              |
    | G2   | SCOTT  | 3000  | 4              |
    | G1   | FORD   | 3000  | 4              |
    | G1   | JONES  | 2975  | 6              |
    | G2   | BLAKE  | 2850  | 7              |
    | G1   | CLARK  | 2450  | 8              |
    | G1   | ALLEN  | 1600  | 9              |
    | G1   | CELL1  | NULL  | 10             |
    | G2   | CELL2  | NULL  | 10             |
    +------+--------+-------+----------------+--+
    
    ---rank() over (partition by 分组字段 order by 排序字段 顺序)
    注意:使用rank()over(order by 排序字段 顺序)排序的时候,空值是最大的
    select gid,ename,sal,rank()over(partition by gid order by coalesce(sal,0) desc) from test_data;
    +------+--------+-------+----------------+--+
    | gid  | ename  |  sal  | rank_window_0  |
    +------+--------+-------+----------------+--+
    | G1   | BING   | 5000  | 1              |
    | G1   | KING   | 5000  | 1              |
    | G1   | FORD   | 3000  | 3              |
    | G1   | JONES  | 2975  | 4              |
    | G1   | CLARK  | 2450  | 5              |
    | G1   | ALLEN  | 1600  | 6              |
    | G1   | CELL1  | NULL  | 7              |
    | G2   | FING   | 5000  | 1              |
    | G2   | SCOTT  | 3000  | 2              |
    | G2   | BLAKE  | 2850  | 3              |
    | G2   | CELL2  | NULL  | 4              |
    +------+--------+-------+----------------+--+
    
    ---dense_rank()over(order by 列名排序)的结果是连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果如:1 1 1 2
    如果排序字段为null,可能造成在排序时将null字段排在最前面,影响排序的正确性。
    所以建议将 dense_rank()over(order by 列名 排序)改为dense_rank()over(order by 列名排序 nulls last)
    
    select gid,ename,sal,dense_rank()over(order by sal desc) from test_data;
    +------+--------+-------+----------------------+--+
    | gid  | ename  |  sal  | dense_rank_window_0  |
    +------+--------+-------+----------------------+--+
    | G2   | FING   | 5000  | 1                    |
    | G1   | BING   | 5000  | 1                    |
    | G1   | KING   | 5000  | 1                    |
    | G2   | SCOTT  | 3000  | 2                    |
    | G1   | FORD   | 3000  | 2                    |
    | G1   | JONES  | 2975  | 3                    |
    | G2   | BLAKE  | 2850  | 4                    |
    | G1   | CLARK  | 2450  | 5                    |
    | G1   | ALLEN  | 1600  | 6                    |
    | G2   | CELL2  | NULL  | 7                    |
    | G1   | CELL1  | NULL  | 7                    |
    +------+--------+-------+----------------------+--+
    select gid,ename,sal,dense_rank()over(partition by gid order by sal desc) from test_data;
    +------+--------+-------+----------------------+--+
    | gid  | ename  |  sal  | dense_rank_window_0  |
    +------+--------+-------+----------------------+--+
    | G1   | BING   | 5000  | 1                    |
    | G1   | KING   | 5000  | 1                    |
    | G1   | FORD   | 3000  | 2                    |
    | G1   | JONES  | 2975  | 3                    |
    | G1   | CLARK  | 2450  | 4                    |
    | G1   | ALLEN  | 1600  | 5                    |
    | G1   | CELL1  | NULL  | 6                    |
    | G2   | FING   | 5000  | 1                    |
    | G2   | SCOTT  | 3000  | 2                    |
    | G2   | BLAKE  | 2850  | 3                    |
    | G2   | CELL2  | NULL  | 4                    |
    +------+--------+-------+----------------------+--+
  • 相关阅读:
    将台湾与山西的资源进行整合,搭建晋台两地商品营销平台
    范姜锋:致力于协助台湾青年创业,融入“一带一路”建设
    不甘于平凡,他靠借钱入行做电器生意,年销售额竟突破亿元?
    研究生接手父亲的事业,当起“猪妹”每天和猪打交道
    面对找不到工作的困难,小伙选择创业开酒吧,月收入竟达到了6万
    django 直接将数据分配给前台
    Centos7下安装与卸载Jdk1.8
    从“挖光缆”到“剪网线”|蚂蚁金服异地多活的微服务体系
    推进“互联网+政务服务” 加快新型智慧城市建设
    推进“互联网+政务服务” 加快新型智慧城市建设
  • 原文地址:https://www.cnblogs.com/yy3b2007com/p/8582924.html
Copyright © 2020-2023  润新知