• clickhouse-(09)- row_number() over (partition by)的几种实现方法


    ClickHouse row_number() over (partition by)的几种实现方法

    hive中有row_number() over (partition by)函数,可以一句SQL实现想要的排序,在ClickHouse中有很多种实现方式,本篇就介绍一下几种方法。
    

    目录

    1.row_number排序

    2.row_number排序后取出rank=1的结果

    3.特殊场景

    1.row_number排序

    HIVE中写法:

    select number,
           row_number() over (partition by number order by time desc) as rank
      from table a
     GROUP BY number
    

    ClickHouse写法:

    select number,
           groupArray(time) AS arr_val,
           arrayEnumerate(arr_val) as row_number
      from (select distinct orderid as number,
                   toDate(operatetime) as time
              from table
             order by time desc
             ) a
     GROUP BY number
    

    2.row_number排序后取出rank=1的结果

    hive写法:

    select orderid
      from (select orderid,
                   row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
              from table
             where d = '${CurrentDate}'
             ) a
     where row_num = 1;
    

    ClickHouse写法:

    方法1:利用groupArray

    select orderid, 
           groupArray(1)(datachange_lasttime) as dates
      from (select orderid, 
                   datachange_lasttime
              from table
             ORDER BY orderid, datachange_lasttime desc
            ) a
     group by orderid
    

    方法2:利用max函数实现倒序,如果正序使用min函数即可

    select orderid,
           max(datachange_lasttime) as datachange_lasttime
      from table
     group by orderid
    

    方法3:利用rowNumberInAllBlocks函数

    select orderid, status
      from (select orderid, status, rowNumberInAllBlocks() as rank
              from (select orderid, status, datachange_lasttime
                      from table
                     order by orderid, datachange_lasttime desc
                     ) a
           ) b LIMIT 1 BY orderid
    

    方法4:利用arrayEnumerate函数

    select orderid
      from (select orderid,
                   groupArray(datachange_lasttime) AS arr_val,
                   arrayEnumerate(arr_val) as row_number
              from (select orderid, datachange_lasttime
                      from table
                     order by datachange_lasttime desc
                     ) a
             GROUP BY number
             ) b
     where row_number = 1
    

    3.特殊场景
    要求:

    对于以下场景,需要按照orderid分组,按照日期倒序,取最新一条,若日期一致,则随机取一条作为结果即可

    hive写法:

    select orderid  from (select orderid,               status,               row_number() over(partition by orderid order by datachange_lasttime desc) as row_num          from table         where d = '${CurrentDate}'         ) as b where row_num = 1
    

    ClickHouse写法:

    通过上面的案例,我们很容易想到,把上面的结果作为一个子表,与原表进行关联,只是这样关联,随便举一个关联的写法:

    select a.orderid as orderid_a, a.status as status  from olap_htlmaindb.tmp_ord_orders_status_s_pre a inner join (select orderid, groupArray(1)(datachange_lasttime) as dates               from (select orderid, datachange_lasttime                       from table                      ORDER BY orderid, datachange_lasttime desc                     ) a              group by orderid) b    on a.orderid = b.orderid   and cast(a.datachange_lasttime as String) = cast(b.dates [ 1 ] as String) 
    

    这里我们是先把符合要求的orderid和时间取出来,再回去关联,取出需要的列,因为这些函数都有一个缺点是只能有partition by的字段和排序字段,不能有其他字段,所以要返回关联,所以上面四种方法,ininer join原表,都不能解决上面案例的问题。

    这里就想到了LIMIT 1 BY这个方法,这个方法其实是最有效的,如下:

    select orderid,        status,        datachange_lasttime  from table order by orderid, datachange_lasttime desc  LIMIT 1 BY orderid
    
  • 相关阅读:
    78. Subsets
    93. Restore IP Addresses
    71. Simplify Path
    82. Remove Duplicates from Sorted List II
    95. Unique Binary Search Trees II
    96. Unique Binary Search Trees
    312. Burst Balloons
    程序员社交平台
    APP Store开发指南
    iOS框架搭建(MVC,自定义TabBar)--微博搭建为例
  • 原文地址:https://www.cnblogs.com/weijiqian/p/14853264.html
Copyright © 2020-2023  润新知