• Hive学习小记-(12)横表与纵表的互相转换***


    需求说明:这是一个横表转纵表与纵表转横表的故事,有点类似行列转换 行转列:一个字段的多行数据合进一个列,通常可用collect_set+concat_ws;列转行:一个字段的一列数据拆到多个行,通常用explode

    横表转纵表:

    1.原横表数据:
    
    cust_id1,jijin_bal,baoxian_bal,cunkuan_bal
    
    转成纵表目标数据:
    
    cust_id1,基金,bal
    
    cust_id1,保险,bal
    
    cust_id1,存款,bal

    方法:concat_ws+lateral view explode +split --算是列转行??其实是相当于把横表变成纵表 参考:https://www.cnblogs.com/foolangirl/p/14145147.html

    纵表转横表

    2.原纵表数据:
    
    cust_id1,基金,bal
    
    cust_id1,保险,bal
    
    cust_id1,存款,bal
    
    转成目标横表数据:
    
    cust_id1,jijin_bal,baoxian_bal,cunkuan_bal

    方法一:case when

    方法二:先转map:cust1,基金:bal,保险:bal,存款:bal ;再inline

    转map参考:https://www.jianshu.com/p/02c2b8906893 

    explode inline参考:https://blog.csdn.net/huobumingbai1234/article/details/80559944) !!!注意explode和inline的map类型

    import pyspark
    from pyspark.sql import SparkSession
    ​
    sc=SparkSession.builder.master("local")
        .appName('hive_col_row')
        .config('spark.executor.memory','2g')
        .config('spark.driver.memory','2g')
        .enableHiveSupport()
        .getOrCreate()
    sc.sql(''' create table test_youhua.zongbiao(id int,prod_nm string,bal float) ''')
    sc.sql(''' insert overwrite table test_youhua.zongbiao values(1,'jijin',1.1),(1,'baoxian',1.2),(1,'cunkuan',1.3),(2,'jijin',2.67),(2,'baoxian',2.34),(2,'cunkuan',2.1) ''')
    sc.sql(''' select * from test_youhua.zongbiao ''').show()
    +---+-------+----+
    | id|prod_nm| bal|
    +---+-------+----+
    |  1|  jijin| 1.1|
    |  1|baoxian| 1.2|
    |  1|cunkuan| 1.3|
    |  2|  jijin|2.67|
    |  2|baoxian|2.34|
    |  2|cunkuan| 2.1|
    +---+-------+----+

    方法一:case when纵表转横表

    sc.sql(''' select id
               ,max(case when prod_nm='jijin' then bal else 0 end) as jijin_bal
               ,max(case when prod_nm='baoxian' then bal else 0 end) as baoxian_bal
               ,max(case when prod_nm='cunkuan' then bal else 0 end) as cunkuan_bal 
               from test_youhua.zongbiao group by id ''').show()
    +---+---------+-----------+-----------+
    | id|jijin_bal|baoxian_bal|cunkuan_bal|
    +---+---------+-----------+-----------+
    |  1|      1.1|        1.2|        1.3|
    |  2|     2.67|       2.34|        2.1|
    +---+---------+-----------+-----------+

    方法二:先转map:cust1,基金:bal,保险:bal,存款:bal ;再inline

    转map参考:https://blog.csdn.net/huobumingbai1234/article/details/80559944
    inline参考:https://blog.csdn.net/weixin_42003671/article/details/88132666
    inline不支持map:https://www.jianshu.com/p/02c2b8906893   
    lateral view inline与 lateral view explode功能类似
    !!!注意explode和inline的map类型
    sc.sql(''' select id
                ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string)))))
            from test_youhua.zongbiao 
            group by id ''').show()
    运行结果,已经转换成了map格式:
    1   {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"}
    2   {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}
    # 这个不行啊,首先inline作用于struct这里map操作不了,explode和inline都是列转行函数,都是将map字段打散开的,相当于把map又做成纵表了
    sc.sql(''' select map_tmp_tbl.id,c1,c2 from  (
        select id
              ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col
        from test_youhua.zongbiao 
            group by id
    ) as map_tmp_tbl  lateral view explode(map_tmp_tbl.map_col) t1 as c1,c2  ''').show()
    +---+-------+----+
    | id|     c1|  c2|
    +---+-------+----+
    |  1|  jijin| 1.1|
    |  1|cunkuan| 1.3|
    |  1|baoxian| 1.2|
    |  2|cunkuan| 2.1|
    |  2|baoxian|2.34|
    |  2|  jijin|2.67|
    +---+-------+----+

    直接select map_col不就OK了??转成横表成功!!

    sc.sql(''' select map_tmp_tbl.id
                     ,map_col['jijin'] as jijin_bal
                     ,map_col['baoxian'] as baoxian_bal
                     ,map_col['cunkuan'] as cunkuan_bal from  (
        select id
              ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col
        from test_youhua.zongbiao 
            group by id
    ) as map_tmp_tbl ''').show()
    +---+---------+-----------+-----------+
    | id|jijin_bal|baoxian_bal|cunkuan_bal|
    +---+---------+-----------+-----------+
    |  1|      1.1|        1.2|        1.3|
    |  2|     2.67|       2.34|        2.1|
    +---+---------+-----------+-----------+

    原来最好存map类型是这个意思,后面想转横表区直接取map的key对应的value可以转横表,想转纵表可以用explode转纵表 eg: 这里的map_tmp_tbl就是存成了map类型,可以看出由explode可以转纵表,直接取map对应key的value值可以转横表

    tips

    前面转map用了: str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) 这里需要注意:collect_ws 这个函数会对元素数组去重,如果不去重用 collect_list 这个函数 str_to_map 函数也会去重,如果传入的键值对有重复,只保留一个。如果还是要传入重复的键值对,可以用下面的函数: regexp_replace(concat('{"',cast( concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))) as string),'"}'),',','","') as map_col

     但这时候是一个json串,而不是map!!!
    sc.sql(''' select id
              ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col
        from test_youhua.zongbiao group by id ''').show()
    -- 跑出来的结果,但是这时候不是map格式,要用json串的形式来读:
    1   {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"}
    2   {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}
    sc.sql(''' select map_tmp_tbl.id
                     ,get_json_object(map_col,'$.jijin')
                     ,get_json_object(map_col,'$.baoxian')
                     ,get_json_object(map_col,'$.cunkuan') from  (
        select id
              ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col
        from test_youhua.zongbiao 
                group by id
    ) as map_tmp_tbl ''').show()
    +---+---------------------------------+-----------------------------------+-----------------------------------+
    | id|get_json_object(map_col, $.jijin)|get_json_object(map_col, $.baoxian)|get_json_object(map_col, $.cunkuan)|
    +---+---------------------------------+-----------------------------------+-----------------------------------+
    |  1|                              1.1|                                1.2|                                1.3|
    |  2|                             2.67|                               2.34|                                2.1|
    +---+---------------------------------+-----------------------------------+-----------------------------------+

    通过解析json串的形式也可以!

    参考:Hive学习小记-(5)表字段变动频繁时用json格式 

    那其实提前存成map格式的好处是比较多的,一是加字段方便,而是横表纵表转换方便。

  • 相关阅读:
    telnet命令测试端口连接是否正常, telnet不是内部或外部命令的方案
    Linux常用命令
    nginx的反向代理的优势,特点于原理(一)
    linux操作系统中的常用命令以及快捷键(一)
    Centos网卡名称命名
    Centos第一次使用配置IP地址
    Linux环境下交叉编译器安装及运行
    jupyter更换路径
    python3实现在二叉树中找出和为某一值的所有路径
    使用 SQL 服务器时,"评估期已过期"错误消息
  • 原文地址:https://www.cnblogs.com/foolangirl/p/14264709.html
Copyright © 2020-2023  润新知