需求:求出最近7天未登陆的流失用户数。
依赖数据:dws_uv_detail_day(每日活跃设备明细表)
建表语句:
drop table if exists ads_wastage_count; create external table ads_wastage_count( `dt` string COMMENT '统计日期', `wastage_count` bigint COMMENT '流失设备数' ) row format delimited fields terminated by ' ' location '/warehouse/gmall/ads/ads_wastage_count';
数据导入脚本:
巧用group和having,先对日活安设备id分组,分组后,登陆日期最近(最大的)都小于7天前,就是我们要的数据。
#!/bin/bash if [ -n "$1" ];then do_date=$1 else do_date=`date -d "-1 day" +%F` fi hive=/opt/module/hive/bin/hive APP=gmall echo "-----------导入日期$do_date-----------" sql=" insert into table "$APP".ads_wastage_count select '$do_date', count(*) from ( select mid_id from "$APP".dws_uv_detail_day group by mid_id having max(dt)<=date_add('$do_date',-7) )t1; " $hive -e "$sql"