转载:http://lxw1234.com/archives/2015/04/185.htm
数据准备:
- d1,user1,1000
- d1,user2,2000
- d1,user3,3000
- d2,user4,4000
- d2,user5,5000
- CREATE EXTERNAL TABLE lxw1234 (
- dept STRING,
- userid string,
- sal INT
- ) ROW FORMAT DELIMITED
- FIELDS TERMINATED BY ','
- stored as textfile location '/tmp/lxw11/';
- hive> select * from lxw1234;
- OK
- d1 user1 1000
- d1 user2 2000
- d1 user3 3000
- d2 user4 4000
- d2 user5 5000
CUME_DIST
–CUME_DIST 小于等于当前值的行数/分组内总行数
–比如,统计小于等于当前薪水的人数,所占总人数的比例
PERCENT_RANK
–PERCENT_RANK 分组内当前行的RANK值-1/分组内总行数-1
应用场景不了解,可能在一些特殊算法的实现中可以用到吧。