Hive中的自定义函数简介
(1) 在类中创建自定义函数。自定义UDF需要继承'org.apache.hadoop.hive.ql.exec.UDF',实现evaluate函数,evaluate函数支持重载。
(2) 将该类所在的包导出成jar包,放入linux目录下。
(3) 进入hive客户端,删除旧的jar包
hive> delete jar /dir/.jar;
(4) 添加新的jar包
hive> add jar /dir/.jar
(5) 创建临时函数,指向jar包中的类
hive> create temporary function <函数名> as 'java类名';
(6) 使用临时函数
select <函数名> (参数); drop temporary function <函数名>;
Hive中的自定义函数案例
package demo.udf; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class ConcatString extends UDF { // string can not translation in hadoop public Text evaluate(Text a, Text b) { return new Text(a.toString() + "*******" + b.toString()); } }
hive> delete jar /root/pl62716/hive/contactString.jar; Deleted [/root/pl62716/hive/contactString.jar] from class path hive> add jar /root/pl62716/hive/contactString.jar; Added [/root/pl62716/hive/contactString.jar] to class path Added resources: [/root/pl62716/hive/contactString.jar] hive> create temporary function myconcat as 'demo.udf.ConcatString'; OK Time taken: 2.747 seconds hive> select myconcat('HELLO','world'); OK HELLO*******world Time taken: 0.598 seconds, Fetched: 1 row(s)