• Hive 11、Hive嵌入Python


    Hive嵌入Python

    Python的输入输出都是 为分隔符,否则会出错,python脚本输入print出规定格式的数据

    用法为先add file,使用语法为TRANSFORM (name, items)   USING 'python test.py'  AS (name string, item1 string,item2 string,item3 string),这里后面几个字段对应python的类型

     下面是一个将一列转成多列表小案例:

    create table test (name string,items string) 
    
    ROW FORMAT DELIMITED 
    
    FIELDS TERMINATED BY '	';
    

      

    LOAD DATA local INPATH '/opt/data/tt.txt' OVERWRITE INTO TABLE test ;

    tt.txt的内容:

    tom	shu fa,wei qi,chang ge
    jack	game,kan shu,shang wang
    lusi	lv you,guang jie,gou wu
    

      表2:

    create table test2 (name string,item1 string,item2 string,item3 string) 
    
    ROW FORMAT DELIMITED 
    
    FIELDS TERMINATED BY '	';
    

      

    -- 将python脚本上传到Hive
    Hive> add file /root/test.py
    

      

    -- 将结果放到test2中
    INSERT OVERWRITE TABLE test2  
    
    SELECT  TRANSFORM (name, items)  
    USING 'python test.py'  
    AS (name string, item1 string,item2 string,item3 string)  
    FROM test;
    

      

    #!/usr/bin/python  
    
    import sys  
    for line in sys.stdin:  
         line = line.strip()    
         name,it = line.split('	')  
         count = it.count(',')+1
         for i in range(0,3-count):
              it = it+',NULL'
         result = it.split(',')[0:3]
         print '%s	%s'%(name,'	'.join(result))
    

      

    结果:
    -- 表1
    hive> select * from test;
    OK
    tom    shu fa,wei qi,chang ge
    jack    game,kan shu,shang wang
    lusi    lv you,guang jie,gou wu
    Time taken: 0.07 seconds, Fetched: 3 row(s)
    
    
     hive> desc test2;
     OK
     name                	string              	                    
     item1               	string              	                    
     item2               	string              	                    
     item3               	string              	                    
     Time taken: 0.141 seconds, Fetched: 4 row(s)
    -- 表2
    hive> select * from test2;
    OK
    tom    shu fa    wei qi    chang ge
    jack    game    kan shu    shang wang
    lusi    lv you    guang jie    gou wu
    Time taken: 1.368 seconds, Fetched: 3 row(s)
    

      

  • 相关阅读:
    java.lang.ArrayIndexOutOfBoundsException异常分析及解决
    Android_开发片段(Part 2)
    保存错误日志回传服务器之回传错误“信息文件”
    node.js
    拼接json
    CommonJS / Node.js/ Vue学习资料
    合并PDF
    java 多线程
    linux 运行jar包
    mvn 命令
  • 原文地址:https://www.cnblogs.com/tesla-turing/p/11509344.html
Copyright © 2020-2023  润新知