一、分类
UDF:User defined function 用户定义函数
一进一出
UDAF:User defined aggregation function
聚类函数:多进一出
如:max min count
UDTF:User definesd table-Generating Function
一进多出
如:lateral view explore
二、实战
1.创建Maven工程,修改pom.xml
hive-pom.xml
2.First, you need to create a new class that extends UDF, with one or more methods named evaluate.
创建一个类继承UDF类,实现 evaluate 方法
package com.cenzhongman.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class LowerUDF extends UDF{
//•Implement one or more methods named evaluate which will be called by Hive (the exact way in which Hive resolves the method to call can be configured by setting a custom UDFMethodResolver). The following are some examples: ◦public int evaluate();
// ◦public int evaluate(int a);
// ◦public double evaluate(int a, double b);
// ◦public String evaluate(String a, int b, Text c);
// ◦public Text evaluate(String a);
// ◦public String evaluate(List<Integer> a); (Note that Hive Arrays are represented as Lists in Hive. So an ARRAY<int> column would be passed in as a List<Integer>.)
// •evaluate should never be a void method. However it can return null if needed. 不允许返回类型为 void 可以返回 null
// •Return types as well as method arguments can be either Java primitives or the corresponding Writable class.
// !!推荐参数使用mapReduce 的类型
public Text evaluate(Text str) {
//void data
if(str.toString() == null) {
return null;
}
//lower
return new Text(str.toString().toLowerCase());
}
//用于测试,Hive 的入口函数是 evaluate 所以没有影响
public static void main(String[] args) {
System.out.println(new LowerUDF().evaluate(new Text("Hive")));
}
}
3.在 Hive 中使用自定义函数
# 添加 jar 到资源库中
add jar /opt/datas/filename.jar
# 创建临时函数
create temporary function my_lower as "com.cenzhongman.hive.udf.LowerUDF";
# 查看函数,确认添加成功
show functions;
# 使用函数
select my_lower(job) Upper_job from emp;
As of Hive 0.13, UDFs also have the option of being able to specify required jars in the CREATE FUNCTION statement:
对于新版本,有一种新的打开方式(文件需在HDFS文件系统上)
CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';