hive函数

hive函数
日期函数

(1)时间戳转换

日期格式转换时间戳

unix_timestamp(date,dateformat)不加参数获取当前的时间戳
```
select unix_timestamp() as time_stamp,unix_timestamp('2018-09-26 9:13:26','yyyy-MM-ddHH:mm:ss') as time_stamp1 from dual;

time_stamp      time_stamp1

1538054001      1537924406
```
时间戳转换日期格式

from_unixtime(timestamp,dateformat)
```
select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') asdate_time,from_unixtime(1537924406,'yyyy-MM-dd') as date_time1 from dual;

date_time       date_time1

2018-09-27 21:16:13     2018-09-26
```
(2)日期格式化

date_format(string/date,dateformat)把字符串或者日期转成指定格式的日期
```
select date_format('2018-09-12','yyyy-MM-dd HH:mm:ss') as date_time,date_format('2018-09-12','yyyyMMdd') as date_time1 from dual;

date_time       date_time1

2018-09-12 00:00:00     20180912
```
to_date(date)返回yyyy-MM-dd日期格式

3、日期加减/月份加减（位移）

对日期进行加减法，date_add函数参数n是正数往后推n天，如果是负数往前推n天，date_sub函数相反

对月份进行加减，add_months 函数参数m是正数往后推m个月，如果是负数往前推m个月
```
date_add(date,n)

date_sub(date,n)

select date_add('2018-09-27',2) as date1,date_add('2018-09-27',-2) as date2 from dual;

date1   date2

2018-09-29      2018-09-25



add_months(date,m)

select add_months('2018-09-27',1) as month1,add_months('2018-09-27',-1) as month2 from dual;

month1  month2

2018-10-27      2018-08-27
```
4、日期差/月份差

日期差计算两个日期之间相差多少天，前面的日期减去后面的日期

月份差计算前面的日期减去后面的日期计算月份差，这个结果是按照天数占月分的比值计算得到的，是一个小数，可以自己取整
```
datediff(date,date1)

select datediff('2018-09-22','2018-08-02') as day_diff,datediff('2018-08-02','2018-09-22') as day_diff1 from dual;

day_diff        day_diff1

51      -51

months_between(date,date1)

select months_between('2018-09-22','2018-08-02') as mon_diff,round(months_between('2018-09-22','2018-08-02')) as mon_diff1 from dual;

mon_diff        mon_diff1

1.64516129      2.0
```
5、获取月末、月初、年初、当前日期下个星期X的日期

获取月末最后一天
```
last_day(date)

select last_day('2018-09-30') as date_time,last_day('2018-09-27 21:16:13') as date_time1 from dual;

date_time       date_time1

2018-09-30      2018-09-30
```
获取月初、年初
```
trunc(date,format)  format:MONTH/MON/MM, YEAR/YYYY/YY

select trunc('2018-09-27','YY') as date_time,trunc('2018-09-27 21:16:13','MM') as date_time1 from dual;

date_time       date_time1

2018-01-01      2018-09-01
```
当前日期下个星期X的日期
```
next_day(date,formate) format:英文星期几的缩写或者全拼

select next_day('2018-09-27','TH') as date_time,next_day('2018-09-27 21:16:13','TU') as date_time1 from dual;

date_time       date_time1

2018-10-04      2018-10-02
```
hive条件函数

(1)if函数:if(条件表达式,结果1,结果2)
```
hive> select if(1=2,100,200) from lxw_dual;
200
hive> select if(1=1,100,200) from lxw_dual;
100
```
(2)coalesce函数:返回参数中的第一个非空值;如果所有值都为null,那么返回null
```
hive> select COALESCE(null,'100','50′) from lxw_dual;
100
```
(3)case when函数:
- CASE a WHEN b THEN c [WHENd THEN e]* [ELSE f] END
- 如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f
```
hive> Select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from lxw_dual;
mary
hive> Select case 200 when 50 then 'tom' when 100 then 'mary' else 'tim' end from lxw_dual;
tim
```
- CASE WHEN a THEN b [WHEN cTHEN d]* [ELSE e] END
- 如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e
```
hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from lxw_dual;
mary
hive> select case when 1=1 then 'tom' when 2=2 then 'mary' else 'tim' end from lxw_dual;
tom
```
count函数

count(*):所有行进行统计,包括null行

count(1):所有行进行统计,包括null行

count(columc)与count(if(条件)):都是对column中非null进行统计,来一条就统计一条,不关心怎样进行分类

count(distinct column):去重统计也会排除字段为null的值
```
使用group by代替distinct
```
count(case when plat=1 then u else null end)

count(distinct case when plat=1 then u else null end)

count(case when (type=2 OR type=6) then u else null end)

count(distinct case when (type=2 OR type=6) then u else null end)
```
create table t1( a int);
insert into t1 (a) values (1);
insert into t1 (a) values (2);
insert into t1 (a) values (3);
insert into t1 (a) values (NULL);


select count(*) from t1;
4
select count(1) from t1;
4
select count(a) from t1;
3
select count(a>1) from t1;
3
select count(distinct a) from t1;
3
select count(distinct case when a>1 then a else NULL end)  from t1
2
```
count(if)与count(distinct if)小案例:

表名: user_active_day (用户日活表)

表内容:

user_id(用户id) user_is_new(是否新用户 1:新增用户 0:老用户) location_city(用户所在地区) partition_date(日期分区)

需求:

　　找出20180901至今的xxx地区的用户日活量以及新增用户量

思路:

　　筛选日期分区和地区,统计user_id的数量为用户日活量,统计user_is_new = 1的数量为新增用户量.

最开始写的hql语句
```
select partition_date,count(user_id),
         count(if(user_is_new = 1, user_id, 0))  --注意新增用户量的统计
from dw.nice_live_dw_user_active_day
where location_city like '%xxx%' and partition_date >= 20180901
group by partition_date;
```
我们使用count(if())来进行筛选统计,但是效果并没有达到,出现的结果如下
```
20180901    16737  16737
```
根本就没有达到筛选的目的,为什么?

这就要从count的机制说起

首先count()是对数据进行计数,说白了就是你来一条数据我计数一条,我不关心你怎么分类,我只对数据计数

每条数据从if()函数出来,还是一条数据,所以count+1

所以count(user_id)跟count(if(user_id))没有任何的区别.

我们稍做修改
```
select partition_date,count(user_id),
         count(distinct if(user_is_new = 1, user_id, 0))  --注意新增用户量的统计,加了distinct去重
from dw.nice_live_dw_user_active_day
where location_city like '%xxx%' and partition_date >= 20180901
group by partition_date;
```
结果如下
```
20180901    16737  261
```
这次看着就像是对了吧,我们加了distinct进行去重

每次来一条数据先过if()然后再进行去重最后统计.但是实际上结果依旧是错误的.

我们来模拟一下筛选统计的过程

我们有这样四条数据

user_id user_is_new

1 1

2 0

3 1

4 0

表中的数据是一条一条遍历的,

(1)当user_id = 1的数据过来的时候,我们先过if函数 user_is_new = 1 ==> count(distinct user_id = 1),

然后我们把user_id = 1进行重复判断,我们用一个模拟容器来模拟去重,

从容器里找user_id = 1的数据,发现没有,不重复,所以通过我们把count+1,然后把user_id = 1的数据放入,用于下条去重

(2)当user_id = 2的数据过来的时候,我们先过if函数 user_is_new = 0 ==> count(distinct 0),

然后我们把0进行重复判断,

从容器里找0的数据,发现没有,不重复,所以通过我们把count+1,然后把0的数据放入,用于下条去重

(3)当user_id = 3的数据过来的时候,我们先过if函数 user_is_new = 1 ==> count(distinct user_id = 3),

然后我们把user_id = 3进行重复判断,

从容器里找user_id = 3的数据,发现没有,不重复,所以通过我们把count+1,然后把user_id = 3的数据放入,用于下条去重

(4)当user_id = 4的数据过来的时候,我们先过if函数 user_is_new = 0 ==> count(distinct 0),

然后我们把0进行重复判断,
从容器里找0的数据,发现重复,是之前user_id = 2的时候过if()转化成0的那条数据,所以count不执行

我们通过模拟count(distinct if)过程发现,在count的时候我们把不符合条件的最开始的那条语句也count进去了一次

导致最终结果比正确结果多了1.

我们在原基础语句上再减去1就是正确的hql语句

其实在日常中我们做分类筛选统计的时候一般是用sum来完成的,符合条件sum+1,不符合条件sum+0
```
select partition_date,count(user_id),
         sum(if(user_is_new = 1, 1, 0))  --用sum进行筛选统计
from dw.nice_live_dw_user_active_day
where location_city like '%xxx%' and partition_date >= 20180901
group by partition_date;
```
结果如下
```
20180901    16737  260
```
sum(if)只试用于单个条件判断,如果筛选条件很多,我们可以用sum(case when then else end)来进行多条件筛选

注意,hive中并没有sum(distinct col1)这种使用方式,我们可以使用sum(col) group by col来达到相同效果.
相关阅读:
GitHub上如何创建组织？
windows中使用Git如何解决文件冲突？
GitHub上如何删除代码仓库？
并发编程模型和访问控制
 大数据征信的应用和启示：ZestFinance的基于大数据的信用评估技术
 Hadoop的Python框架指南
 Redis+Django(Session,Cookie、Cache)的用户系统
 如何使你的Ajax应用内容可让搜索引擎爬行
 Web运营手记
 全屏slider--swiper
原文地址：https://www.cnblogs.com/tjp0210/p/11980158.html

日期函数

hive条件函数

count函数