今天帮同事测试,发现代码里有个好用的hive 函数:
1. collect_set 可以输出未包含在groupby里的字段。条件是,这个字段值对应于主键是唯一的。
select a, collect_set(b)[0], count(*) -- 同时想输出每个主键对应的b字段 from ( select 'a' a, 'b' b from test.dual )a group by a; -- 根据a group by
2. concat_ws 和collect_set 一起可以把group by的结果集,合并成一条记录。
对表
col_1 | col_0 |
hell0 | 1 |
hello | 2 |
合并成一条
hello | 1,2 |
select col_1, concat_ws(',',collect_set(cast(col_0 as string))) as col_0s from ( select 1 col_0,col_1 from test.dual union all select 2 col_0,col_1 from test.dual )a group by col_1;