• hive 函数


    collect_set(x)   列转行函数---没有重复, 组装多列的数据的结构体
    collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
    concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符

    UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;

    UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;

    UDTF(User-Defined Table-Generating Functions)  用来解决 输入一行输出多行(On-to-many maping) 的需求。

    lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable  as aa; 虚拟表adTable的别名为aa

    explode(ARRAY)  列表中的每个元素生成一行

    explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列

    | CREATE TABLE `employees`(                                            |
    |   `name` string,                                                     |
    |   `salary` float,                                                    |
    |   `subdinates` array<string>,                                        |
    |   `deducation` map<string,float>,                                    |
    |   `address` struct<street:string,city:string,state:string,zip:int>)  |
    | ROW FORMAT SERDE                                                     |
    |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'               |
    | STORED AS INPUTFORMAT                                                |
    |   'org.apache.hadoop.mapred.TextInputFormat'                         |
    | OUTPUTFORMAT                                                         |
    |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'       |
    | LOCATION                                                             |
    |   'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees'    |
    | TBLPROPERTIES (                                                      |
    |   'creator'='tianyongtao',                                           |
    |   'last_modified_by'='root',                                         |
    |   'last_modified_time'='1521447397',                                 |
    |   'numFiles'='0',                                                    |
    |   'numRows'='0',                                                     |
    |   'rawDataSize'='0',                                                 |
    |   'totalSize'='0',                                                   |
    |   'transient_lastDdlTime'='1521447397')                              |
    +----------------------------------------------------------------------+--+

     

    Array类型字段的处理

    0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates  from employees;
    +---------------+-------------------------+--+
    |     name      |       subdinates        |
    +---------------+-------------------------+--+
    | tianyongtao   | ["wang","ZHANG","LIU"]  |
    | wangyangming  | ["ma","zhong"]          |
    +---------------+-------------------------+--+
    2 rows selected (0.301 seconds)

    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa  from employees lateral view explode(subdinates) adTable  as aa;
    +---------------+--------+--+
    |     name      |   aa   |
    +---------------+--------+--+
    | tianyongtao   | wang   |
    | tianyongtao   | ZHANG  |
    | tianyongtao   | LIU    |
    | wangyangming  | ma     |
    | wangyangming  | zhong  |
    +---------------+--------+--+
    5 rows selected (0.312 seconds)

    Map类型字段的处理

    0: jdbc:hive2://192.168.53.122:10000/default> select deducation  from employees;
    +---------------------------------+--+
    |           deducation            |
    +---------------------------------+--+
    | {"aaa":10.0,"bb":5.0,"CC":8.0}  |
    | {"aaa":6.0,"bb":12.0}           |
    +---------------------------------+--+
    2 rows selected (0.315 seconds)
    0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb)  from employees;
    +------+-------+--+
    |  aa  |  bb   |
    +------+-------+--+
    | aaa  | 10.0  |
    | bb   | 5.0   |
    | CC   | 8.0   |
    | aaa  | 6.0   |
    | bb   | 12.0  |
    +------+-------+--+
    5 rows selected (0.314 seconds)
    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb  from employees lateral view explode(deducation) mtable as aa,bb;
    +---------------+------+-------+--+
    |     name      |  aa  |  bb   |
    +---------------+------+-------+--+
    | tianyongtao   | aaa  | 10.0  |
    | tianyongtao   | bb   | 5.0   |
    | tianyongtao   | CC   | 8.0   |
    | wangyangming  | aaa  | 6.0   |
    | wangyangming  | bb   | 12.0  |
    +---------------+------+-------+--+
    5 rows selected (0.347 seconds)

    0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc  from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable  as cc;
    +---------------+------+-------+--------+--+
    |     name      |  aa  |  bb   |   cc   |
    +---------------+------+-------+--------+--+
    | tianyongtao   | aaa  | 10.0  | wang   |
    | tianyongtao   | aaa  | 10.0  | ZHANG  |
    | tianyongtao   | aaa  | 10.0  | LIU    |
    | tianyongtao   | bb   | 5.0   | wang   |
    | tianyongtao   | bb   | 5.0   | ZHANG  |
    | tianyongtao   | bb   | 5.0   | LIU    |
    | tianyongtao   | CC   | 8.0   | wang   |
    | tianyongtao   | CC   | 8.0   | ZHANG  |
    | tianyongtao   | CC   | 8.0   | LIU    |
    | wangyangming  | aaa  | 6.0   | ma     |
    | wangyangming  | aaa  | 6.0   | zhong  |
    | wangyangming  | bb   | 12.0  | ma     |
    | wangyangming  | bb   | 12.0  | zhong  |
    +---------------+------+-------+--------+--+
    13 rows selected (0.305 seconds)

    结构体类型字段:

    0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state  from employees;
    +---------------+---------+-----------+----------+--+
    |     name      | street  |   city    |  state   |
    +---------------+---------+-----------+----------+--+
    | tianyongtao   | HENAN   | LUOHE     | LINYING  |
    | wangyangming  | hunan   | changsha  | NULL     |
    +---------------+---------+-----------+----------+--+
    2 rows selected (0.309 seconds)

    collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段

    0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
    +------------------+-----------+----------------+--+
    |  cust.custname   | cust.sex  | cust.nianling  |
    +------------------+-----------+----------------+--+
    | tianyt_touch100  | 1         | 50             |
    | wangwu           | 1         | 85             |
    | zhangsan         | 1         | 20             |
    | liuqin           | 0         | 56             |
    | wangwu           | 0         | 47             |
    | liuyang          | 1         | 32             |
    | hello            | 0         | 100            |
    | mahuateng        | 1         | 1001           |
    | tianyt_touch100  | 1         | 50             |
    | wangwu           | 1         | 85             |
    | zhangsan         | 1         | 20             |
    | liuqin           | 0         | 56             |
    | wangwu           | 0         | 47             |
    | nihao            | 1         | 5              |
    | liuyang          | 1         | 32             |
    | hello            | 0         | 100            |
    | mahuateng        | 1         | 1001           |
    | nihao            | 1         | 5              |
    +------------------+-----------+----------------+--+


    scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
    +---+---------------------+
    |sex|collect_set(nianling)|
    +---+---------------------+
    |  1| [85, 5, 20, 50, 3...|
    |  0|        [100, 56, 47]|
    +---+---------------------+

    0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
    +----------------+---------------------------------------------------------------+--+
    | cityinfo.city  |                      cityinfo.districts                       |
    +----------------+---------------------------------------------------------------+--+
    | shenzhen       | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu  |
    | qingdao        | shinan,lichang,jimo,jiaozhou,huangdao,laoshan                 |
    +----------------+---------------------------------------------------------------+--+

    0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
    +-----------+------------+--+
    |   city    |    area    |
    +-----------+------------+--+
    | shenzhen  | longhua    |
    | shenzhen  | futian     |
    | shenzhen  | baoan      |
    | shenzhen  | longgang   |
    | shenzhen  | dapeng     |
    | shenzhen  | guangming  |
    | shenzhen  | nanshan    |
    | shenzhen  | luohu      |
    | qingdao   | shinan     |
    | qingdao   | lichang    |
    | qingdao   | jimo       |
    | qingdao   | jiaozhou   |
    | qingdao   | huangdao   |
    | qingdao   | laoshan    |
    +-----------+------------+--+
    14 rows selected (0.479 seconds)

    已知数据求截止当前月的最大值与截止当前月份的和:

    scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
    +------+-------+-----+
    |custid|monthid|times|
    +------+-------+-----+
    |     1| 201801|   25|
    |     1| 201801|   10|
    |     1| 201802|   35|
    |     1| 201802|    7|
    |     1| 201803|   52|
    |     1| 201805|    6|
    |     2| 201801|   32|
    |     2| 201801|    1|
    |     2| 201802|   10|
    |     2| 201802|   18|
    |     2| 201803|   91|
    |     2| 201804|    6|
    |     2| 201804|    4|
    |     2| 201805|   31|
    +------+-------+-----+

    scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
    +------+-------+----------+----------+
    |custid|monthid|sum(times)|max(times)|
    +------+-------+----------+----------+
    |     1| 201801|        35|        25|
    |     1| 201802|        77|        35|
    |     1| 201803|       129|        52|
    |     1| 201804|       129|        52|
    |     1| 201805|       135|        52|
    |     2| 201801|        33|        32|
    |     2| 201802|        61|        32|
    |     2| 201803|       152|        91|
    |     2| 201804|       162|        91|
    |     2| 201805|       193|        91|
    +------+-------+----------+----------+

    关联的时候小表写在左边

  • 相关阅读:
    获取一张表的所有列
    SqlServer查询数据库所有用户表的记录数
    IE和Firefox在JavaScript方面的兼容性(转)
    ASP.NET2.0调用MySql的存储过程
    javascript在中ie与firefox的区别与解决方案(转)
    针对Firefox兼容性,要注意的一些问题 (转)
    JavaScript Import XML Document
    Remote建立分析
    sql2000和文本文件的写入和读取(转)
    firefox与IE对javascript和CSS的区别(转)
  • 原文地址:https://www.cnblogs.com/playforever/p/9605229.html
Copyright © 2020-2023  润新知