Hive中row_number的使用 - 润新知

Hive中row_number的使用
1、hive的分组和组内排序---语法

语法：
row_number() over (partition by 字段a order by 计算项b desc ) rank

rank是排序的别名

partition by：类似hive的建表，分区的意思；

order by ：排序，默认是升序，加desc降序；

这里按字段a分区，对计算项b进行降序排序

2、hive的分组和组内排序 --- 实例

要取top10品牌，各品牌的top10渠道，各品牌的top10渠道中各渠道的top10档期

1、取top10品牌

select “品牌” , sum/count/其他() as num from "table_name" order by num desc limit 10;

2、取top10品牌下各品牌的top10渠道

select a.* from (select "品牌","渠道",sum/count() as num, row_number () over (partition by "品牌" order by num desc) rank from “table_name” where 品牌限制条件 group by “品牌”,“渠道” ) a having a.rank <= 10;

3、取top10品牌下各品牌的top10渠道中各渠道的top10档期

select a.* from (select "品牌","渠道","档期",sum/count/其他() as num row_number() over (partition by "档期" order by num desc) rank from "table_name" where 品牌限制条件 group by “品牌”,“渠道) a Having a.rank <= 10;

row_number的使用在hive和spark的实时计算中常常会用到计算分区中的排序问题，所以使用好row_number是很重要的。
作者：跨界师
链接：https://www.jianshu.com/p/51599bab0c00
来源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。
相关阅读:
安装 TensorFlow
Active Learning
基于PU-Learning的恶意URL检测
 AAAI 2018 论文 | 蚂蚁金服公开最新基于笔画的中文词向量算法
 Graph 卷积神经网络：概述、样例及最新进展
 深度学习在graph上的使用
 xgboost入门与实战（实战调参篇）
xgboost入门与实战（原理篇）
机器学习中的损失函数（着重比较：hinge loss vs softmax loss）
<html>
原文地址：https://www.cnblogs.com/sidesky/p/12877081.html

Copyright © 2020-2023 润新知