• hive有关函数


    1.窗口函数
    2015年4月份购买过的顾客及总人数

    select distinct name,count(1) over() as cnt from test_window_yf
    where substr(orderdate,1,7)='2015-04';

    select name,count(1) over() as cnt from test_window_yf
    where substr(orderdate,1,7)='2015-04' group by name;

    顾客的购买明细及月购买总额
    将cost按照月进行累加
    //默认从起始行到当前行
    select name,orderdate,cost,sum(cost) over(partition by month(orderdate) order by orderdate) from test_window_yf;
    sum(cost) over() as sample1,--所有行相加
    sum(cost) over(partition by name) as sample2,--按name分组,组内数据相加
    sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加
    sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合
    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合
    sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行
    sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行
    select name,orderdate,cost,

    from test_window_yf;

    name orderdate cost sample1 sample2 sample3 sample4 sample5 sample6 sample7
    jack 2015-01-01 10 661 176 10 10 10 56 176
    jack 2015-01-05 46 661 176 56 56 56 111 166
    jack 2015-01-08 55 661 176 111 111 101 124 120
    jack 2015-02-03 23 661 176 134 134 78 120 65
    jack 2015-04-06 42 661 176 176 176 65 65 42
    mart 2015-04-08 62 661 299 62 62 62 130 299
    mart 2015-04-09 68 661 299 130 130 130 205 237
    mart 2015-04-11 75 661 299 205 205 143 237 169
    mart 2015-04-13 94 661 299 299 299 169 169 94
    neil 2015-05-10 12 661 92 12 12 12 92 92
    neil 2015-06-12 80 661 92 92 92 92 92 80
    tony 2015-01-02 15 661 94 15 15 15 44 94
    tony 2015-01-04 29 661 94 44 44 44 94 79
    tony 2015-01-07 50 661 94 94 94 79 79 50


    select name,orderdate,cost,
    ntile(4) over() as sample1 , --全局数据切片
    ntile(4) over(partition by name), -- 按照name进行分组,在分组内将数据切成3份
    ntile(4) over(order by cost),--全局按照cost升序排列,数据切成3份
    ntile(4) over(partition by name order by cost ) --按照name分组,在分组内按照cost升序排列,数据切成3份
    from test_window_yf;


    2.高级聚合函数
    grouping sets / cube / rollup
    grouping__id
    2015-03,2015-03-10,cookie1
    2015-03,2015-03-10,cookie5
    2015-03,2015-03-12,cookie7
    2015-04,2015-04-12,cookie3
    2015-04,2015-04-13,cookie2
    2015-04,2015-04-13,cookie4
    2015-04,2015-04-16,cookie4
    2015-03,2015-03-10,cookie2
    2015-03,2015-03-10,cookie3
    2015-04,2015-04-12,cookie5
    2015-04,2015-04-13,cookie6
    2015-04,2015-04-15,cookie3
    2015-04,2015-04-15,cookie2
    2015-04,2015-04-16,cookie1

    CREATE TABLE sospdm.test_function_yf (
    month STRING,
    day STRING,
    cookieid STRING
    ) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    stored as textfile;

    GROUPING SETS
    在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行 UNION ALL

    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM sospdm.test_function_yf
    GROUP BY month,day
    GROUPING SETS ((month,day),day);
    --
    ORDER BY GROUPING__ID;
    --GROUPING__ID,表示结果属于哪一个分组集合。
    month day uv GROUPING__ID
    2015-03 NULL 5 1
    2015-04 NULL 6 1
    NULL 2015-03-10 4 2
    NULL 2015-03-12 1 2
    NULL 2015-04-12 2 2
    NULL 2015-04-13 3 2
    NULL 2015-04-15 2 2
    NULL 2015-04-16 2 2
    <=>
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day


    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM sospdm.test_function_yf
    GROUP BY month,day
    GROUPING SETS ((month,day),(day))
    ORDER BY GROUPING__ID;

    2015-04 NULL 6 1
    2015-03 NULL 5 1
    NULL 2015-03-10 4 2
    NULL 2015-04-16 2 2
    NULL 2015-04-15 2 2
    NULL 2015-04-13 3 2
    NULL 2015-04-12 2 2
    NULL 2015-03-12 1 2
    2015-04 2015-04-16 2 3
    2015-04 2015-04-12 2 3
    2015-04 2015-04-13 3 3
    2015-03 2015-03-12 1 3
    2015-03 2015-03-10 4 3
    2015-04 2015-04-15 2 3

    cube:
    根据GROUP BY的维度的所有组合进行聚合。

    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM sospdm.test_function_yf
    GROUP BY month,day
    WITH CUBE
    ORDER BY GROUPING__ID;

    <=>

    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM sospdm.test_function_yf
    GROUP BY month,day
    grouping sets((month,day),month,day,())
    ORDER BY GROUPING__ID;


    NULL NULL 7 0 --区别
    2015-03 NULL 5 1
    2015-04 NULL 6 1
    NULL 2015-04-16 2 2
    NULL 2015-04-15 2 2
    NULL 2015-04-13 3 2
    NULL 2015-04-12 2 2
    NULL 2015-03-12 1 2
    NULL 2015-03-10 4 2
    2015-04 2015-04-12 2 3
    2015-04 2015-04-16 2 3
    2015-03 2015-03-12 1 3
    2015-03 2015-03-10 4 3
    2015-04 2015-04-15 2 3
    2015-04 2015-04-13 3 3


    ROLLUP
    是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

    比如,以month维度进行层级聚合:
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    WITH ROLLUP
    ORDER BY GROUPING__ID;

    month day uv GROUPING__ID
    ---------------------------------------------------
    NULL NULL 7 0
    2015-03 NULL 5 1
    2015-04 NULL 6 1
    2015-03 2015-03-10 4 3
    2015-03 2015-03-12 1 3
    2015-04 2015-04-12 2 3
    2015-04 2015-04-13 3 3
    2015-04 2015-04-15 2 3
    2015-04 2015-04-16 2 3

    --把month和day调换顺序,则以day维度进行层级聚合:

    SELECT
    day,
    month,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY day,month
    WITH ROLLUP
    ORDER BY GROUPING__ID;


    day month uv GROUPING__ID
    ------------------------------------
    NULL NULL 7 0
    2015-04-13 NULL 3 1
    2015-03-12 NULL 1 1
    2015-04-15 NULL 2 1
    2015-03-10 NULL 4 1
    2015-04-16 NULL 2 1
    2015-04-12 NULL 2 1
    2015-04-12 2015-04 2 3
    2015-03-10 2015-03 4 3
    2015-03-12 2015-03 1 3
    2015-04-13 2015-04 3 3
    2015-04-15 2015-04 2 3
    2015-04-16 2015-04 2 3


    ------
    二、日期函数

    1.日期函数 to_date(string expr)

    返回类型:string

    描述:返回时间字符串日期部分

    to_date(expr) - Extracts the date part of the date or datetime expression expr

    实例:

    hive> select to_date('2014-09-16 15:50:08.119') from default.dual;

    2014-09-16


    2.年份函数 year(string expr)

    返回类型:int

    描述:返回时间字符串年份数字

    year(date) - Returns the year of date

    实例:

    hive> select year('2014-09-16 15:50:08.119') from default.dual;

    2014


    3.月份函数 month(string expr)

    返回类型:int

    描述:返回时间字符串月份数字

    month(date) - Returns the month of date

    实例:

    hive> select month('2014-09-16 15:50:08.119') from default.dual;

    09


    4.天函数 day(string expr)

    返回类型:int

    描述:返回时间字符串的天

    day(date) - Returns the date of the month of date

    实例:

    hive> select day('2014-09-16 15:50:08.119') from default.dual;

    16


    5.小时函数 hour(string expr)

    返回类型:int

    描述:返回时间字符串小时数字

    hour(date) - Returns the hour of date

    实例:

    hive> select hour('2014-09-16 15:50:08.119') from default.dual;

    15


    6.分钟函数 hour(string expr)

    返回类型:int

    描述:返回时间字符串分钟数字

    minute(date) - Returns the minute of date

    实例:

    hive> select minute('2014-09-16 15:50:08.119') from default.dual;

    50

    7.秒函数 second(string expr)

    返回类型:int

    描述:返回时间字符串分钟数字

    second(date) - Returns the second of date

    实例:

    hive> select second('2014-09-16 15:50:08.119') from default.dual;

    08

    8.日期增加函数 date_add(start_date, num_days)

    返回类型:string

    描述:返回增加num_days 天数的日期(负数则为减少)

    date_add(start_date, num_days) - Returns the date that is num_days after start_date.

    实例:

    hive>select date_add('2014-09-16 15:50:08.119',10) from default.dual;

    2014-09-26

    hive>select date_add('2014-09-16 15:50:08.119',-10) from default.dual;

    2014-09-06

    9.日期减少函数 date_sub(start_date, num_days)

    返回类型:string

    描述:返回num_days 天数之前的日期(负数则为增加)

    date_sub(start_date, num_days) - Returns the date that is num_days before start_date.

    实例:

    hive>select date_sub('2014-09-16 15:50:08.119',10) from default.dual;

    2014-09-06

    hive>select date_sub('2014-09-16 15:50:08.119',-10) from default.dual;

    2014-09-26

    10.周期函数 weekofyear(start_date, num_days)

    返回类型:int

    描述:返回当前日期位于本年的周期 一周一个周期

    weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.

    实例:

    hive>select weekofyear('2014-09-16 15:50:08.119') from default.dual;

    38

    11.日期比较函数 weekofyear(start_date, num_days)

    返回类型:string

    描述:返回2个时间的日期差

    datediff(date1, date2) - Returns the number of days between date1 and date2

    date1-date2

    实例:

    hive>select datediff('2014-09-16 15:50:08.119','2014-09-15') from default.dual;

    1

  • 相关阅读:
    c++构造函数析构函数调用顺序
    c++隐藏实例
    c++子类和父类成员函数重名
    C++虚函数·
    c/c++字符数组和字符串大揭秘
    python 基础回顾 一
    python java scala 单例模式
    推荐一款好用并且免费的markdown软件 Typora
    java 的垃圾回收机制 【转】
    python的垃圾回收机制【转】
  • 原文地址:https://www.cnblogs.com/yin-fei/p/10752011.html
Copyright © 2020-2023  润新知