• hive lateral view 与 explode详解


     

    1.explode

    hive wiki对于expolde的解释如下:

    explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.

    As an example of using explode() in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:

    这里写图片描述

    Then running the query:

    SELECT explode(myCol) AS myNewCol FROM myTable;

    will produce: 
    这里写图片描述 
    The usage with Maps is similar:

    SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;

    总结起来一句话:explode就是将hive一行中复杂的array或者map结构拆分成多行。

    使用实例: 
    xxx表中有一个字段mvt为string类型,数据格式如下:

    [{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”},{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”},{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”},{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”},{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”},{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”},{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”},{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}]

    用explode小试牛刀一下:

    select explode(split(regexp_replace(mvt,'\[|\]',''),'\},\{')) from ods_mvt_hourly where day=20160710 limit 10;

    最后出来的结果如下: 
    {“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1” 
    “eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1” 
    “eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1” 
    “eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01” 
    “eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01” 
    “eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android” 
    “eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1” 
    “eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”} 
    {“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1” 
    “eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”

    2.lateral view

    hive wiki 上的解释如下:

    Lateral View Syntax

    lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)* 
    fromClause: FROM baseTable (lateralView)*

    Description

    Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

    Example

    Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page) 
    这里写图片描述

    An example table with two rows: 
    这里写图片描述

    and the user would like to count the total number of times an ad appears across all pages. 
    A lateral view with explode() can be used to convert adid_list into separate rows using the query:

    SELECT pageid, adid
    FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

    The resulting output will be 
    这里写图片描述 
    Then in order to count the number of times a particular ad appears, count/group by can be used:

    SELECT adid, count(1)
    FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
    GROUP BY adid;

    The resulting output will be 
    这里写图片描述

    由此可见,lateral view与explode等udtf就是天生好搭档,explode将复杂结构一行拆成多行,然后再用lateral view做各种聚合。

    3.实例

    还是第一部分的例子,上面我们explode出来以后的数据,不是标准的json格式,我们通过lateral view与explode组合解析出标准的json格式数据:

    SELECT ecrd, CASE WHEN instr(mvtstr,'{')=0
        AND instr(mvtstr,'}')=0 THEN concat('{',mvtstr,'}') WHEN instr(mvtstr,'{')=0
        AND instr(mvtstr,'}')>0 THEN concat('{',mvtstr) WHEN instr(mvtstr,'}')=0
        AND instr(mvtstr,'{')>0 THEN concat(mvtstr,'}') ELSE mvtstr END AS mvt
          FROM ods.ods_mvt_hourly LATERAL VIEW explode(split(regexp_replace(mvt,'\[|\]',''),'\},\{')) addTable AS mvtstr
            WHERE DAY='20160710' and ecrd is not null limit 10

    查询出来的结果: 
    xxx 
    {“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”} 
    xxx 
    {“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”} 
    xxx 
    {“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”} 
    xxx 
    {“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”} 
    xxx 
    {“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01” 
    xxx 
    {“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”} 
    xxx 
    {“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”} 
    xxx 
    {“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”1”,”vid”:”38”,”vr”:”var1”} 
    xxx 
    {“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”} 
    xxx 
    {“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}

    4.Ending

    Lateral View通常和UDTF一起出现,为了解决UDTF不允许在select字段的问题。 
    Multiple Lateral View可以实现类似笛卡尔乘积。 
    Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防止丢失数据。

    参考内容: 
    1.http://blog.csdn.net/oopsoom/article/details/26001307 lateral view的用法实例 
    2.https://my.oschina.net/leejun2005/blog/120463 复合函数的用法,比较详细 
    3.http://blog.csdn.net/zhaoli081223/article/details/46637517 udtf的介绍

    Lateral View用法 与 Hive UDTF explode

    Lateral View是Hive中提供给UDTF的conjunction,它可以解决UDTF不能添加额外的select列的问题。

    1. Why we need Lateral View?

    当我们想对hive表中某一列进行split之后,想对其转换成1 to N的模式,即一行转多列。
    hive不允许我们在UDTF函数之外,再添加其它select语句。
    如下,我们想将登录某个游戏的用户id放在一个字段user_ids里,对每一行数据用UDTF后输出多行。
    1.  
      select game_id, explode(split(user_ids,'\[\[\[')) as user_id   from login_game_log  where dt='2014-05-15'
    2.  
      FAILED: Error in semantic analysis: UDTF's are not supported outside the SELECT clause, nor nested in expressions。

    提示语法分析错误,UDTF不支持函数之外的select 语句,真无语。。。

    如果我们想支持怎么办呢?接下来就是Lateral View 登场的时候了。

    2. Lateral View explain

    2.1 单个Lateral View

    Lateral view is used in conjunction with user-defined table generatingfunctions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.

    解释一下:

    Lateral view 其实就是用来和像类似explode这种UDTF函数联用的。lateral view 会将UDTF生成的结果放到一个虚拟表中,然后这个虚拟表会和输入行即每个game_id进行join 来达到连接UDTF外的select字段的目的。

    Lateral View Syntax

    lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
    fromClause: FROM baseTable (lateralView)*
    可以看出,可以在2个地方用Lateral view:

    1. 在udtf前面用

    2. 在from baseTable后面用

    举个例子:

    1. 先创建一个文件,里面2列用 分割,game_id和user_ids

    1.  
      hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by ' ' stored as textfile;
    2.  
      OK
    3.  
      Time taken: 2.451 seconds
    4.  
      hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli;
    5.  
      Copying data from file:/home/hadoop/test_lateral
    6.  
      Copying file: file:/home/hadoop/test_lateral
    7.  
      Loading data to table dw.test_lateral_view_shengli
    8.  
      OK
    9.  
      Time taken: 6.716 seconds
    10.  
      hive> select * from test_lateral_view_shengli;
    11.  
      OK
    12.  
      game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110
    13.  
      game66 winning1ren[[[13810537508
    14.  
      game101 hu330602003[[[hu330602004[[[hu330602005[[[15967506560

    下面使用lateral_view
    1.  
      hive> select game_id, user_id
    2.  
      > from test_lateral_view_shengli lateral view explode(split(userl_ids,'\[\[\[')) snTable as user_id
    3.  
      > ;
    4.  
      Total MapReduce jobs = 1
    5.  
      Launching Job 1 out of 1
    6.  
      Number of reduce tasks is set to 0 since there's no reduce operator
    7.  
      Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839
    8.  
      Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839
    9.  
      2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0%
    10.  
      2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0%
    11.  
      2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100%
    12.  
      Ended Job = job_201403301416_445839
    13.  
      OK
    14.  
      game101 hu330602003
    15.  
      game101 hu330602004
    16.  
      game101 hu330602005
    17.  
      game101 15967506560
    18.  
      game101 15358083654
    19.  
      game101 ab33787873
    20.  
      game101 zjy18052480603
    21.  
      game101 shlg1881826
    22.  
      game101 lxqab110
    23.  
      game66 winning1ren
    24.  
      game66 13810537508

    2.2 多个Lateral View

    From语句后可以跟多个Lateral View。
    A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.
    给定数据:

    Array<int> col1

    Array<string> col2

    [1, 2]

    [a", "b", "c"]

    [3, 4]

    [d", "e", "f"]

    转换目标:
    想同时把第一列和第二列拆开,类似做笛卡尔乘积。

    int myCol1

    string myCol2

    1

    "a"

    1

    "b"

    1

    "c"

    2

    "a"

    2

    "b"

    2

    "c"

    3

    "d"

    3

    "e"

    3

    "f"

    4

    "d"

    4

    "e"

    4

    "f"

    我们可以这样写:
    1.  
      SELECT myCol1, myCol2 FROM baseTable
    2.  
      LATERAL VIEW explode(col1) myTable1 AS myCol1
    3.  
      LATERAL VIEW explode(col2) myTable2 AS myCol2;

    3. Outer Lateral View

    还有一种情况,如果UDTF转换的Array是空的怎么办呢?
    在Hive0.12里面会支持outer关键字,如果UDTF的结果是空,默认会被忽略输出。
    如果加上outer关键字,则会像left outer join 一样,还是会输出select出的列,而UDTF的输出结果是NULL。
    hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;
    结果是什么都不输出。
     
    如果加上outer关键字:
    SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;

    1.  
      238 val_238 NULL
    2.  
      86 val_86 NULL
    3.  
      311 val_311 NULL
    4.  
      27 val_27 NULL
    5.  
      165 val_165 NULL
    6.  
      409 val_409 NULL
    7.  
      255 val_255 NULL
    8.  
      278 val_278 NULL
    9.  
      98 val_98 NULL
    10.  
      ...

    4.总结:

     
    Lateral View通常和UDTF一起出现,为了解决UDTF不允许在select字段的问题。
    Multiple Lateral View可以实现类似笛卡尔乘积。
    Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防止丢失数据。
     
    原创文章,转载请注明出自:http://blog.csdn.net/oopsoom/article/details/26001307
  • 相关阅读:
    git 从创建到推送到远程,到拉取,实操
    《React后台管理系统实战 :三》header组件:页面排版、天气请求接口及页面调用、时间格式化及使用定时器、退出函数
    《React后台管理系统实战 :一》:目录结构、引入antd、引入路由、写login页面、使用antd的form登录组件、form前台验证、高阶函数/组件
    《React后台管理系统实战 :二》antd左导航:cmd批量创建子/目录、用antd进行页面布局、分离左导航为单独组件、子路由、动态写左导航、css样式相对陷阱
    《React后台管理系统实战 :四》产品分类管理页:添加产品分类、修改(更新)产品分类
    go的变量与常量
    Go 语言最简单程序的结构
    go的安装与测试
    java
    go语言
  • 原文地址:https://www.cnblogs.com/pejsidney/p/9564532.html
Copyright © 2020-2023  润新知