Hive的内置函数

Hive的内置函数
定义：

UDF(User-Defined-Function)，用户自定义函数对数据进行处理。

UDTF(User-Defined Table-Generating Functions) 用来解决输入一行输出多行(On-to-many maping) 的需求。

UDAF(User Defined Aggregation Function)用户自定义聚合函数，操作多个数据行，产生一个数据行。

用法：

　　1、UDF函数可以直接应用于select语句，对查询结构做格式化处理后，再输出内容。

　　2、编写UDF函数的时候需要注意一下几点：

a）自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。

b）需要实现evaluate函。

c）evaluate函数支持重载。

hive的本地模式：

　　大多数的Hadoop job是需要hadoop提供的完整的可扩展性来处理大数据的。不过，有时hive的输入数据量是非常小的。在这种情况下，为查询出发执行任务的时间消耗可能会比实际job的执行时间要多的多。对于大多数这种情况，hive可以通过本地模式在单台机器上处理所有的任务。对于小数据集，执行时间会明显被缩短。

　　如此一来，对数据量比较小的操作，就可以在本地执行，这样要比提交任务到集群执行效率要快很多。

　　配置如下参数，可以开启Hive的本地模式：
```
hive> set hive.exec.mode.local.auto=true;(默认为false)
```
　　当一个job满足如下条件才能真正使用本地模式：
　　　　1.job的输入数据大小必须小于参数：hive.exec.mode.local.auto.inputbytes.max(默认128MB)
　　　　2.job的map数必须小于参数：hive.exec.mode.local.auto.tasks.max(默认4)
　　　　3.job的reduce数必须为0或者1

hive 中窗口函数row_number,rank,dense_ran,ntile分析函数的用法

　　示例数据：

1　　 a 10

2   a   12

3   b   13

4   b   12

5   a   14

6   a   15

7   a   13

8   b   11

9   a   16

10 b   17

11 a   14

　　

　　sql语句：
```
select id,
name,
sal,
rank()over(partition by name order by sal desc ) rp,
dense_rank() over(partition by name order by sal desc ) drp,
row_number()over(partition by name order by sal desc) rmp
from f_test
```
　　结果：
```
10    b    17    1    1    1
3    b    13    2    2    2
4    b    12    3    3    3
8    b    11    4    4    4
9    a    16    1    1    1
6    a    15    2    2    2
11    a    14    3    3    3
5    a    14    3    3    4
7    a    13    5    4    5
2    a    12    6    5    6
1    a    10    7    6    7
```
　　ntile

　　　　ntile(n)，用于将分组数据按照顺序切分成n片，返回当前切片值
　　　　ntile不支持rows between，比如 ntile(2) over(partition by cookieid order by createtime rows between 3 preceding and current row)
　　　　如果切片不均匀，默认增加第一个切片的分布

　　比如需求为:求sal前50%的人
```
select * from (
select id,
name,
sal,
NTILE(2) over(partition by name order by sal desc ) rn
from f_test
) t where t.rn=1
```
Hive已定义函数介绍：

　　1、字符串长度函数：length

　　　　语法: length(string A)
　　　　返回值: int
　　举例：
hive> select length(‘abcedfg’) from dual; 7
　　2、字符串反转函数：reverse

　　语法: reverse(string A)
　　返回值: string
　　说明：返回字符串A的反转结果

　　举例：
hive> select reverse(‘abcedfg’) from dual; gfdecba
　　3、字符串连接函数：concat

　　　　语法: concat(string A, string B…)
　　　　返回值: string
　　　　说明：返回输入字符串连接后的结果，支持任意个输入字符串

　　举例：
hive> select concat(‘abc’,'def’,'gh’) from dual; abcdefgh
　　4、带分隔符字符串连接函数：concat_ws

　　　　语法: concat_ws(string SEP, string A, string B…)
　　　　返回值: string
　　　　说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
　　举例：
hive> select concat_ws(‘,’,'abc’,'def’,'gh’) from dual; abc,def,gh
　　5、字符串截取函数：substr,substring

　　　　语法: substr(string A, int start),substring(string A, int start)
　　　　返回值: string
　　　　说明：返回字符串A从start位置到结尾的字符串
　　举例：
hive> select substr(‘abcde’,3) from dual; cde hive> select substring(‘abcde’,3) from dual; cde hive> select substr(‘abcde’,-1) from dual; （和ORACLE相同） e
　　6、类型转换

　　　　类型转换：case
```
select cast(1 as float); --1.0  
select cast('2016-05-22' as date); --2016-05-22 
```
　　　　字符串转大写函数：upper,ucase

　　　　字符串转小写函数：lower,lcase

　　　　语法: lower(string A) lcase(string A)
　　　　返回值: string
　　　　说明：返回字符串A的小写格式
　　举例：
hive> select lower(‘abSEd’) from dual; absed hive> select lcase(‘abSEd’) from dual; absed
　　7、左右去除空格函数

　　　　左边去空格函数：ltrim

　　　　右边去空格函数：rtrim

　　8、正则表达式替换函数：regexp_replace

　　　　语法: regexp_replace(string A, string B, string C)
　　　　返回值: string
　　　　说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符
　　举例：
hive> select regexp_replace(‘foobar’, ‘oo|ar’, ”) from dual; fb
　　9、正则表达式解析函数：regexp_extract

　　　　语法: regexp_extract(string subject, string pattern, int index)
　　　　返回值: string
　　　　说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。注意，在有些情况下要使用转义字符
　　举例：
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual; the hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) from dual; bar hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 0) from dual; foothebar
　　10、URL解析函数：parse_url，parse_url_tuple（UDTF）

　　　　语法: parse_url(string urlString, string partToExtract [, string keyToExtract])，parse_url_tuple功能类似parse_url()，但它可以同时提取多个部分并返回
　　　　返回值: string
　　　　说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
　　举例：
hive> select parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1′, ‘HOST’) from dual; facebook.com hive> select parse_url_tuple('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY:k1', 'QUERY:k2'); v1 v2
　　11、json解析函数：get_json_object

　　　　语法: get_json_object(string json_string, string path)
　　　　返回值: string
　　　　说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。
　　举例：
hive> select get_json_object(‘{“store”: > {“fruit”:[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], > “bicycle”:{“price”:19.95,”color”:”red”} > }, > “email”:”amy@only_for_json_udf_test.net”, > “owner”:”amy” > } > ‘,’$.owner’) from dual; amy
　　12、集合查找函数: find_in_set

　　　　语法: find_in_set(string str, string strList)
　　　　返回值: int
　　　　说明: 返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0（只能是逗号分隔，不然返回0）
　　举例：
hive> select find_in_set(‘ab’,'ef,ab,de’) from dual; 2 hive> select find_in_set(‘at’,'ef,ab,de’) from dual; 0
　　13、行转列：explode （posexplode Available as of Hive 0.13.0）

　　　　说明：将输入的一行数组或者map转换成列输出
　　　　语法：explode(array (or map))
　　举例：
hive> select explode(split(concat_ws(',','1','2','3','4','5','6','7','8','9'),',')) from test.dual; 1 2 3 4 5 6 7 8 9
　　14、多行转换：lateral view

　　　　说明：lateral view用于和json_tuple，parse_url_tuple，split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
　　举例：
　　　　假设我们有一张表pageAds，它有两列数据，第一列是pageid string，第二列是adid_list，即用逗号分隔的广告ID集合：

string pageid Array<int> adid_list

"front_page" [1, 2, 3]

"contact_page" [3, 4, 5]

　　　　要统计所有广告ID在所有页面中出现的次数。

　　　　首先分拆广告ID：
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
　　　　　　　　　　　　　　　　　　　　　　执行结果如下：
string pageid int adid

"front_page" 1

"front_page" 2

"front_page" 3

"contact_page" 3

"contact_page" 4

"contact_page" 5

　　解释一下，from后面是你的表名，在表名后面加lateral view explode。。。（你的行转列sql），还必须要起一个别名，我这个字段的别名为sp。然后再看看select后面的 s.*，就是原表的字段，我这里面只有一个字段，且为X

　　多个lateral view的sql类如：
```
select * from exampletable lateral view explode(col1) mytable1 as mycol1 lateral view explode(mycol1) mytable2 as mycol2;
```
　　15、union结果集合并

　　　　union将多个select语句的结果集合并为一个独立的结果集
```
create table dw_oute_numbs as 
select 'step1' as step,count(distinct remote_addr) as numbs from ods_click_pageviews where datestr='2013-09-20' and request like '/item%'
union
select 'step2' as step,count(distinct remote_addr) as numbs from ods_click_pageviews where datestr='2013-09-20' and request like '/category%'
union
select 'step3' as step,count(distinct remote_addr) as numbs from ods_click_pageviews where datestr='2013-09-20' and request like '/order%'
union
select 'step4' as step,count(distinct remote_addr) as numbs from ods_click_pageviews where datestr='2013-09-20' and request like '/index%';
```
```
+---------------------+----------------------+--+
| dw_oute_numbs.step | dw_oute_numbs.numbs |
+---------------------+----------------------+--+
| step1 　　　　　　　　| 1029 　　　　　　　　　|
| step2 　　　　　　　　| 1029 　　　　　　　　　|
| step3 　　　　　　　　| 1028 　　　　　　　　　|
| step4 　　　　　　　　| 1018 　　　　　　　　　|
+---------------------+----------------------+--+
```
　　抽取一行数据转换到新表的多列样例：

　　　　http_referer是获取的带参数请求路径，其中非法字符用做了转义，根据路径解析出地址，查询条件等存入新表中，
drop table if exists t_ods_tmp_referurl; create table t_ ods _tmp_referurl as SELECT a.*,b.* FROM ods_origin_weblog a LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, """, ""), 'HOST', 'PATH','QUERY', 'QUERY:id') b as host, path, query, query_id;
　　复制表，并将时间截取到日：
drop table if exists t_ods_tmp_detail; create table t_ods_tmp_detail as select b.*,substring(time_local,0,10) as daystr, substring(time_local,11) as tmstr, substring(time_local,5,2) as month, substring(time_local,8,2) as day, substring(time_local,11,2) as hour From t_ ods _tmp_referurl b;
相关阅读:
CSS 基础语法
 标签
 HDU 5487 Difference of Languages BFS
HDU 5473 There was a kingdom 凸包 DP
HDU 5468 Puzzled Elena 莫比乌斯反演
 BNU 3692 I18n 模拟
 补题列表
 POJ 3241 曼哈顿距离最小生成树 Object Clustering
UVa 1309 DLX Sudoku
CodeForces Round #320 Div2
原文地址：https://www.cnblogs.com/jifengblog/p/9286800.html

最新文章
servlet相关 jar包位置 BAE上部署web应用
 path 与classpath针对JAVA来说
 Rocket
Rocket
Rocket
Rocket
Rocket
Rocket
Rocket
Rocket

热门文章
Rocket
Rocket
CSS 后代选择器
 CSS 分组
 CSS 派生选择器
 如何创建 CSS
CSS 属性选择器
 CSS 类选择器
 CSS id 选择器
 CSS 高级语法

string pageid	Array<int> adid_list
"front_page"	[1, 2, 3]
"contact_page"	[3, 4, 5]

string pageid	int adid
"front_page"	1
"front_page"	2
"front_page"	3
"contact_page"	3
"contact_page"	4
"contact_page"	5

Hive的内置函数

定义：

用法：

hive的本地模式：

hive 中窗口函数row_number,rank,dense_ran,ntile分析函数的用法

ntile

Hive已定义函数介绍：

1、字符串长度函数：length

2、字符串反转函数：reverse

3、字符串连接函数：concat

4、带分隔符字符串连接函数：concat_ws

5、字符串截取函数：substr,substring

6、类型转换

7、左右去除空格函数

8、正则表达式替换函数：regexp_replace

9、正则表达式解析函数：regexp_extract

10、URL解析函数：parse_url，parse_url_tuple（UDTF）

11、json解析函数：get_json_object

12、集合查找函数: find_in_set

13、行转列：explode （posexplode Available as of Hive 0.13.0）

14、多行转换：lateral view

15、union结果集合并

抽取一行数据转换到新表的多列样例：

　　ntile

　　1、字符串长度函数：length

　　2、字符串反转函数：reverse

　　3、字符串连接函数：concat

　　4、带分隔符字符串连接函数：concat_ws

　　5、字符串截取函数：substr,substring

　　6、类型转换

　　7、左右去除空格函数

　　8、正则表达式替换函数：regexp_replace

　　9、正则表达式解析函数：regexp_extract

　　10、URL解析函数：parse_url，parse_url_tuple（UDTF）

　　11、json解析函数：get_json_object

　　12、集合查找函数: find_in_set

　　13、行转列：explode （posexplode Available as of Hive 0.13.0）

　　14、多行转换：lateral view

　　15、union结果集合并

　　抽取一行数据转换到新表的多列样例：