• Hadoop_UDTF示例


    UDTF: 一进多出

    UDTF(User-Defined Table-Generating Function)支持一个输入多个输出,
    一般用于解析工作,比如说解析url,然后获取url中的信息
    编码:继承GenericUDTF,实现方法:initializa(返回返回值的参数类型)、process具体的处理方法,
       一般在这个方法中会调用父类的forward方法进行数据的写出、最终调用close方法和MR程序中的cleanUp关闭资源
    

    简单示例,将一列数据分成两列输出,name--> name,name+email

    package com.hive.udtf;
    
    import java.util.ArrayList;
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    
    public class myudtf extends GenericUDTF{
    	
      @Override
      public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {
    		
        if(argOIs.getAllStructFieldRefs().size() != 1){
          throw new UDFArgumentException("Argument Only one");
        }
    		
        ArrayList<String> fieldname = new ArrayList<String>();
          fieldname.add("name");
          fieldname.add("email");
          ArrayList<ObjectInspector> fieldoi = new ArrayList<ObjectInspector>();
          fieldoi.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
          fieldoi.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
          return ObjectInspectorFactory.getStandardStructObjectInspector(fieldname, fieldoi);
        }
    
        @Override
        public void process(Object[] args) throws HiveException {
    		
          if(args.length == 1){
            String name = args[0].toString();
            String email = name+"@foxmail.com";
            super.forward(new String[]{name,email});
          }
        }
    
        @Override
        public void close() throws HiveException {
    		
          super.forward(new String[] {"complete","finish"});
        }
    }
    

     测试

    hive (workdb)> add jar /home/liuwl/opt/datas/myudtf.jar;  
    hive (workdb)> create temporary function myudtf as 'com.hive.udtf.myudtf';
    hive (workdb)> select myudtf(ename) as (name,email) from emp;
    结果: name   email
        SMITH  SMITH@foxmail.com
        ALLEN  ALLEN@foxmail.com
        WARD   WARD@foxmail.com
        JONES  JONES@foxmail.com
        MARTIN  MARTIN@foxmail.com
        BLAKE  BLAKE@foxmail.com
        CLARK  CLARK@foxmail.com
        SCOTT  SCOTT@foxmail.com
        KING   KING@foxmail.com
        TURNER  TURNER@foxmail.com
        ADAMS  ADAMS@foxmail.com
        JAMES  JAMES@foxmail.com
        FORD   FORD@foxmail.com
        MILLER  MILLER@foxmail.com
        complete	finish
    
  • 相关阅读:
    MySql 用户 及权限操作
    MAC 重置MySQL root 密码
    在mac系统安装Apache Tomcat的详细步骤[转]
    Maven:mirror和repository 区别
    ES6 入门系列
    转场动画CALayer (Transition)
    OC 异常处理
    Foundation 框架
    Enum枚举
    Invalid App Store Icon. The App Store Icon in the asset catalog in 'xxx.app' can’t be transparent nor contain an alpha channel.
  • 原文地址:https://www.cnblogs.com/eRrsr/p/6097034.html
Copyright © 2020-2023  润新知