hive窗口函数
需求:
需要取出一批数据里边指定key出现时间最早的那条记录。
根据serial_number进行分组,bus_time进行排序,输出同一个key的排序rank
row_number() over (partition by a.serial_number order by a.bus_time) as rank
按照需求只需要找出rank 为1的所有记录数,即可。
hive -e"
select *
from (
SELECT a.serial_number,
a.bus_time,
a.errcode,
a.errdesc,
row_number() over (partition by a.serial_number order by a.bus_time) as rank
FROM cmbh_log.test a
where a.dt = '20171010' and a.errcode='0000') B
where B.rank = 1;" > /data/work/suc_login_num.txt