• 常用HiveQL总结


    1. 建表

    以纯文本数据建表:

    create table default.calendar_table 
    (
    day_cal date
    ,week_cal string
    ,montn_cal string
    ,year_cal string
    )
    row format delimited fields terminated by ','
    stored as textfile;

    若未指定为外部表(external table),则默认为托管表(managed table)。二者的区别在于load与drop操作:托管表用load data inpath加载数据(路径可为本地目录,也可是HDFS目录),该操作会将该文件放在HDFS目录:/user/hive/warehouse/ 下;而外部表的数据是在location中指定,一般配合partition描述数据的生成信息;drop托管表时会将元数据与/user/hive/warehouse/下的数据一起删掉,而drop外部表时只会删除元数据。将本地文件加载到托管表:

    load data local inpath 'cal.csv' overwrite into table default.calendar_table;

    以orc file数据建外部表表:

    create external table default.ad_base
    (
    uid string
    ,adx string
    ,exposure string
    ,click string
    )
    partitioned by (day_time date)
    stored as orc
    location '/<hdfs>/<path>';

    2. Partition

    增加partition并指定location:

    alter table DEFAULT.ad_base
    add if not exists partition (day_time=date '2016-05-20')
    location '2016-05-20/xxx';

    重新设置partition的location:

    alter table DEFAULT.ad_base
    partition (day_time=date '2016-05-20')
    set location 'hdfs://<path>/<to>/';  -- must be an absolute path

    删除partition

    alter table DEFAULT.ad_base
    drop if exists partition (day_time=date '2016-05-20')
    ignore protection;

    查看所有的paritition,以及查看某一partition的详细信息:

    show partitions ad_base;
    
    describe formatted ad_base partition(day_time = '2016-05-20');

    3. UDF

    Hive的UDF非常丰富,基本能满足大部分的需求。

    正则匹配获取相应字符串:

    regexp_extract(b.dvc, '(.*)_(.*)', 2) as imei

    复杂数据类型map、struct、指定schema的struct、array、union的构造如下:

    map(key1, value1, key2, value2, ...)
    struct(val1, val2, val3, ...)
    named_struct(name1, val1, name2, val2, ...)
    array(val1, val2, ...)
    create_union(tag, val1, val2, ...)

    获取复杂数据类型的某列值:

    array: A[n]
    map: M[key]
    struct: S.x

    条件判断case when,比如,在left join中指定默认值:

    select uid, media, 
        case when b.tag is NULL then array(named_struct('tag','EMPTY', 'label','EMPTY')) else b.tag end as tags
    from ad_base a
    left outer join ad_tag b on (a.uid = regexp_extract(b.dvc, '(.*)_(.*)', 2) and exposure = '1');

    4. UDTF

    UDTF主要用来对复杂数据类型进行平铺操作,比如,explode平铺array与map,inline平铺array<struct>;这种内置的UDTF要与lateral view配合使用:

    select myCol1, col2 FROM baseTable
    lateral view explode(col1) myTable1 AS myCol1;
    
    select uid, tag, label
    from ad_tag
    lateral view inline(tags) tag_tb;
    -- tags: array<struct<tag:string,label:string>>

    5. 多维分析

    Hive 提供grouping set、rollup、cube关键字进行多维数据分析,可以解决自定义的维度组合、上钻维度(n+1n+1种)组合、所有的维度组合(2n2n种)的需求。比如:

    SELECT a, b, SUM( c ) 
    FROM tab1 
    GROUP BY a, b GROUPING SETS ( (a, b), a, b, ( ) )
    
    -- equivalent aggregate query with group by
    SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
    UNION
    SELECT a, null, SUM( c ) FROM tab1 GROUP BY a, null
    UNION
    SELECT null, b, SUM( c ) FROM tab1 GROUP BY null, b
    UNION
    SELECT null, null, SUM( c ) FROM tab1
    
    
    GROUP BY a, b, c, WITH ROLLUP 
    -- is equivalent to 
    GROUP BY a, b, c GROUPING SETS ( (a, b, c), (a, b), (a), ( ))
    
    
    GROUP BY a, b, c WITH CUBE 
    -- is equivalent to 
    GROUP BY a, b, c GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( ))

    此外,Hive还提供了GROUPING__ID函数对每一组合的维度进行编号,以区分该统计属于哪一维度组合,比如:

    select tag, media, grouping__id, count(*) as pv
    from ad_base
    group by tag, media with rollup;

    以指定分隔符保存结果到本地目录:

    explain
    INSERT OVERWRITE LOCAL DIRECTORY '/home/<path>/<to>' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY '	' 
    select media, count(distinct uid) as uv
    from ad_base 
    where day_time = '2016-05-20' and exposure = '1'
    group by media;
    如需转载,请注明作者及出处.
    作者:Treant
  • 相关阅读:
    向量代数与空间解析几何(前篇)
    操作系统(笔试系列)-第七讲设备管理
    win10系统IIS服务器配置
    IIS本地部署Arcgis for js API开发文档
    IIS本地部署Arcgis for js API
    vue中使用mockjs服务器测试项目
    vue动态配置嵌套页面(含iframe嵌套)可实现白天夜间皮肤切换
    如何在vue项目打包去掉console
    Vue之element table 后端排序实现
    D3.tsv与D3.csv加载数据
  • 原文地址:https://www.cnblogs.com/ilinuxer/p/6804882.html
Copyright © 2020-2023  润新知