• Hive 中的 UDF


    LanguageManual UDF

    一、分类

    UDF:User defined function 用户定义函数
    	一进一出
    UDAF:User defined aggregation function 
    	聚类函数:多进一出
    	如:max min count
    UDTF:User definesd table-Generating Function 
    	一进多出
    	如:lateral view explore
    

    二、实战

    1.创建Maven工程,修改pom.xml

    hive-pom.xml

    2.First, you need to create a new class that extends UDF, with one or more methods named evaluate.

    创建一个类继承UDF类,实现 evaluate 方法

    package com.cenzhongman.hive.udf;
    
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;
    
    public class LowerUDF extends UDF{
    
    	//•Implement one or more methods named evaluate which will be called by Hive (the exact way in which Hive resolves the method to call can be configured by setting a custom UDFMethodResolver). The following are some examples: ◦public int evaluate();
    	//	◦public int evaluate(int a);
    	//	◦public double evaluate(int a, double b);
    	//	◦public String evaluate(String a, int b, Text c);
    	//	◦public Text evaluate(String a);
    	//	◦public String evaluate(List<Integer> a); (Note that Hive Arrays are represented as Lists in Hive. So an ARRAY<int> column would be passed in as a List<Integer>.)
    	//	•evaluate should never be a void method. However it can return null if needed. 不允许返回类型为 void 可以返回 null
    	//	•Return types as well as method arguments can be either Java primitives or the corresponding Writable class.
    	//  !!推荐参数使用mapReduce 的类型
    
    	public Text evaluate(Text str) {
    		//void data 
    		if(str.toString() == null) {
    			return null;
    		}
    		//lower
    		return new Text(str.toString().toLowerCase());
    	}
    	
    	//用于测试,Hive 的入口函数是 evaluate 所以没有影响
    	public static void main(String[] args) {
    		System.out.println(new LowerUDF().evaluate(new Text("Hive")));
    	}
    }
    

    3.在 Hive 中使用自定义函数

    # 添加 jar 到资源库中
    add jar /opt/datas/filename.jar
    
    # 创建临时函数
    create temporary function my_lower as "com.cenzhongman.hive.udf.LowerUDF";
    
    # 查看函数,确认添加成功
    show functions;
    
    # 使用函数
    select my_lower(job) Upper_job from emp;
    

    As of Hive 0.13, UDFs also have the option of being able to specify required jars in the CREATE FUNCTION statement:

    对于新版本,有一种新的打开方式(文件需在HDFS文件系统上)

    CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';
  • 相关阅读:
    VisualSVN-Server windows 版安装时报错 "Service 'VisualSVN Server' failed to start. Please check VisualSVN Server log in Event Viewer for more details."
    Pytest 单元测试框架之初始化和清除环境
    Pytest 单元测试框架入门
    Python(email 邮件收发)
    Python(minidom 模块)
    Python(csv 模块)
    禅道简介
    2020年最好的WooCommerce主题
    Shopify网上开店教程(2020版)
    WooCommerce VS Magento 2020:哪个跨境电商自建站软件更好?
  • 原文地址:https://www.cnblogs.com/cenzhongman/p/7182725.html
Copyright © 2020-2023  润新知