• hive 窗口函数(三)


    数据准备

    cookie1,2015-04-10 10:00:02,url2
    cookie1,2015-04-10 10:00:00,url1
    cookie1,2015-04-10 10:03:04,1url3
    cookie1,2015-04-10 10:50:05,url6
    cookie1,2015-04-10 11:00:00,url7
    cookie1,2015-04-10 10:10:00,url4
    cookie1,2015-04-10 10:50:01,url5
    cookie2,2015-04-10 10:00:02,url22
    cookie2,2015-04-10 10:00:00,url11
    cookie2,2015-04-10 10:03:04,1url33
    cookie2,2015-04-10 10:50:05,url66
    cookie2,2015-04-10 11:00:00,url77
    cookie2,2015-04-10 10:10:00,url44
    cookie2,2015-04-10 10:50:01,url55
    create table hive.cookie4_sum
    (
        cookieid   string,
        createtime string,
        url        string
    )
        row format delimited fields terminated by ',';
     select * from hive.cookie4_sum;

      一、lag

    lag(col,n,default) 用于统计窗口内往上第 n 行值

    参数说明:

    第一个参数为列名,
    第二个参数为往上第 n 行(可选,默认为 1),
    第三个参数为默认值(当往上第 n 行为 null 时候,取默认值,如不指定,则为null)

    select cookieid,
           createtime,
           url,
           row_number() over (partition by cookieid order by createtime)                              as rn,
           lag(createtime, 1, '1970-01-01 00:00:00') over (partition by cookieid order by createtime) as last_1_time,
           lag(createtime, 2) over (partition by cookieid order by createtime)                        as last_2_time
    from hive.cookie4_sum;

     说明:

    last_1_time: 指定了往上第1行的值,默认值 为'1970-01-01 00:00:00'  
            cookie1第一行,往上1行为 null,因此取默认值 1970-01-01 00:00:00
            cookie1第三行,往上1行值为第二行值,2015-04-10 10:00:02
            cookie1第六行,往上1行值为第五行值,2015-04-10 10:50:01
    last_2_time: 指定了往上第2行的值, 默认值 null
            cookie1第一行,往上2行为 null
            cookie1第二行,往上2行为 null
            cookie1第四行,往上2行为第二行值,2015-04-10 10:00:02
            cookie1第七行,往上2行为第五行值,2015-04-10 10:50:01

    二、lead

    lead(col,n,default) 用于统计窗口内往下第 n 行值;

    参数说明:
    第一个参数为列名,
    第二个参数为往下第 n 行(可选,默认为1),
    第三个参数为默认值(当往下第 n 行为 null 时候,取默认值,如不指定,则为 null)

    select cookieid,
           createtime,
           url,
           row_number() over (partition by cookieid order by createtime)                               as rn,
           lead(createtime, 1, '1970-01-01 00:00:00') over (partition by cookieid order by createtime) as next_1_time,
           lead(createtime, 2) over (partition by cookieid order by createtime)                        as next_2_time
    from hive.cookie4_sum; 

     三、first_value

    取分组内数据排序后,截止到当前行,第一个值

    select cookieid,
           createtime,
           url,
           row_number() over (partition by cookieid order by createtime)     as rn,
           first_value(url) over (partition by cookieid order by createtime) as first1
    from  hive.cookie4_sum;

     四、last_value

    取分组内数据排序后,截止到当前行,最后一个值

    select cookieid,
           createtime,
           url,
           row_number() over (partition by cookieid order by createtime)    as rn,
           last_value(url) over (partition by cookieid order by createtime) as last1
    from hive.cookie4_sum;

  • 相关阅读:
    c++ 连接mysql数据库
    c语言数据结构分析2之 链表插入删除
    c语言数据结构分析1之 链表创建
    微软 翻译工具
    c语言数据结构分析7之 二分查找
    sencha touch2 demo
    程序员装B指南
    c语言数据结构分析6之 快速排序
    c语言数据结构分析5之 冒泡
    对话jQuery之父John Resig:JavaScript的开发之路
  • 原文地址:https://www.cnblogs.com/wdh01/p/14798587.html
Copyright © 2020-2023  润新知