需求,获取到的访问日志,现根据某用户间隔30分钟后为新的会话。
测试环境:postgresql
author : jackical
datetime : 2019-01-04
select *,sum(f1) over(order by timestamp_) from ( select timestamp_,case when timestamp_ - lag(timestamp_,1) over(order by timestamp_) > 60*30 then 1 else 0 end as f1 from ( select EXTRACT(epoch FROM CAST(data_lastchangetime AS TIMESTAMP)) as timestamp_ from bt_apppackages where data_lastchangetime>'2019-01-01' order by data_lastchangetime ) tab order by timestamp_ ) tab order by timestamp_ limit 100;
解说,由于窗口函数在over子句中有两中的边界定义方式: 1, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
2, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
关于边界的使用:
SELECT cookieid, createtime, pv, SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 SUM(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 ----------------------------------------------------------------------------- cookie1 2015-04-10 1 1 1 26 1 6 26 cookie1 2015-04-11 5 6 6 26 6 13 25 cookie1 2015-04-12 7 13 13 26 13 16 20 cookie1 2015-04-13 3 16 16 26 16 18 13 cookie1 2015-04-14 2 18 18 26 17 21 10 cookie1 2015-04-15 4 22 22 26 16 20 8 cookie1 2015-04-16 4 26 26 26 13 13 4
关于边界定义心得:1,默认边界定义,在有order by时为 起点到当前行
2,没有在over中指定排序时,则边界为当前窗口(所有行)
3,貌似有些资料说hive 在默认的边界定认为 range between unbounded preceding and current row,不管是range还是row 都是从起始到当前行
这种内容也不是第一次接触了,刚刚别人问了下,再想想比较困难,索性将结果与心得记下来。下次不必再去想,拿来就用。。