• Hadoop_UDTF示例


    UDTF: 一进多出

    UDTF(User-Defined Table-Generating Function)支持一个输入多个输出,
    一般用于解析工作,比如说解析url,然后获取url中的信息
    编码:继承GenericUDTF,实现方法:initializa(返回返回值的参数类型)、process具体的处理方法,
       一般在这个方法中会调用父类的forward方法进行数据的写出、最终调用close方法和MR程序中的cleanUp关闭资源
    

    简单示例,将一列数据分成两列输出,name--> name,name+email

    package com.hive.udtf;
    
    import java.util.ArrayList;
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    
    public class myudtf extends GenericUDTF{
    	
      @Override
      public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {
    		
        if(argOIs.getAllStructFieldRefs().size() != 1){
          throw new UDFArgumentException("Argument Only one");
        }
    		
        ArrayList<String> fieldname = new ArrayList<String>();
          fieldname.add("name");
          fieldname.add("email");
          ArrayList<ObjectInspector> fieldoi = new ArrayList<ObjectInspector>();
          fieldoi.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
          fieldoi.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
          return ObjectInspectorFactory.getStandardStructObjectInspector(fieldname, fieldoi);
        }
    
        @Override
        public void process(Object[] args) throws HiveException {
    		
          if(args.length == 1){
            String name = args[0].toString();
            String email = name+"@foxmail.com";
            super.forward(new String[]{name,email});
          }
        }
    
        @Override
        public void close() throws HiveException {
    		
          super.forward(new String[] {"complete","finish"});
        }
    }
    

     测试

    hive (workdb)> add jar /home/liuwl/opt/datas/myudtf.jar;  
    hive (workdb)> create temporary function myudtf as 'com.hive.udtf.myudtf';
    hive (workdb)> select myudtf(ename) as (name,email) from emp;
    结果: name   email
        SMITH  SMITH@foxmail.com
        ALLEN  ALLEN@foxmail.com
        WARD   WARD@foxmail.com
        JONES  JONES@foxmail.com
        MARTIN  MARTIN@foxmail.com
        BLAKE  BLAKE@foxmail.com
        CLARK  CLARK@foxmail.com
        SCOTT  SCOTT@foxmail.com
        KING   KING@foxmail.com
        TURNER  TURNER@foxmail.com
        ADAMS  ADAMS@foxmail.com
        JAMES  JAMES@foxmail.com
        FORD   FORD@foxmail.com
        MILLER  MILLER@foxmail.com
        complete	finish
    
  • 相关阅读:
    ResultSet转换成List的方法
    恒星英语
    java中Float类型数据四舍五入
    总结JS打印方法
    jquery 点击除操作区域外的任意区域,将操作区域隐藏
    PowerDesigner pdm生成Access的方法
    reportbuilder设置打印页范围技巧_delphi教程
    SQL2000中因为选定的用户拥有对象,所以无法除去该用户.
    delphi 函数指针
    用ADO控件读EXCEL或DBF文件
  • 原文地址:https://www.cnblogs.com/eRrsr/p/6097034.html
Copyright © 2020-2023  润新知