• 电商离线数仓项目实战(下)--第五阶段模块二作业


    题目

    1.用拉链表实现核心交易分析中DIM层商家维表,并实现该拉链表的回滚(自己构造数据,编写SQL,并要有相应的文字说明);

    2. 在会员分析中计算 沉默会员数流失会员数

        沉默会员的定义:只在安装当天启动过App,而且安装时间是在7天前
        流失会员的定义:最近30天未登录的会员

    3. 在核心交易分析中完成如下指标的计算
        统计2020年每个季度的销售订单笔数、订单总额
        统计2020年每个月的销售订单笔数、订单总额
        统计2020年每周(周一到周日)的销售订单笔数、订单总额
        统计2020年国家法定节假日、休息日、工作日的订单笔数、订单总额

    详解(1)

    一、创建表加载数据(准备工作)

    1.创建商家信息表--拉链表(dim层)

    特点:

    • 相较于商家维表,多了两个字段startdate,enddate
    • 拉链表不是分区表
    DROP TABLE IF EXISTS `dim.dim_trade_shops`;
    CREATE EXTERNAL TABLE `dim.dim_trade_shops`(
    `shopid` int,
    `userid` int,
    `areaid` int,
    `shopname` string,
    `shoplevel` tinyint,
    `status` tinyint,
    `createtime` string,
    `modifytime` string,
    `startdate` string,
    `enddate` string
    )COMMENT
    '商家店铺表';

    2.准备三天的数据

    /root/data/shops/shop-2020-11-20.dat
    100050,1,100225,WSxxx营超市,1,1,2020-06-28,2020-11-20 13:22:22
    100052,2,100236,新鲜xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22
    100053,3,100011,华为xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22
    100054,4,100159,小米xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22
    100055,5,100211,苹果xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22


    /root/data/shops/shop-2020-11-21.dat
    100057,7,100311,三只xxx鼠零食,1,1,2020-06-28,2020-11-21 13:22:22
    100058,8,100329,良子xxx铺美食,1,1,2020-06-28,2020-11-21 13:22:22
    100054,4,100159,小米xxx旗舰店,2,1,2020-06-28,2020-11-21 13:22:22
    100055,5,100211,苹果xxx旗舰店,2,1,2020-06-28,2020-11-21 13:22:22


    /root/data/shops/shop-2020-11-22.dat
    100059,9,100225,乐居xxx日用品,1,1,2020-06-28,2020-11-22 13:22:22
    100060,10,100211,同仁xxx大健康,1,1,2020-06-28,2020-11-22 13:22:22
    100052,2,100236,新鲜xxx旗舰店,1,2,2020-06-28,2020-11-22 13:22:22

    3.创建商家信息维表(ods层)

    DROP TABLE IF EXISTS `ods.ods_trade_shops`;
    CREATE EXTERNAL TABLE `ods.ods_trade_shops`(
    `shopid` int,
    `userid` int,
    `areaid` int,
    `shopname` string,
    `shoplevel` tinyint,
    `status` tinyint,
    `createtime` string,
    `modifytime` string
    )
    COMMENT '商家店铺表'
    PARTITIONED BY (`dt` string)
    row format delimited fields terminated by ','
    location '/user/data/trade.db/shops';

    二、拉链表的实现

    • ods_trade_shops(分区表) => 每日变更的数据(修改的+新增的) / 历史数据(第一天)
    • dim_trade_shops(拉链表)=> 多了两个字段 startdate 、 enddate

    1、ods_trade_shops 初始化(2020-11-20)。获取历史数据

    load data local inpath '/root/data/shops/shop-2020-11-20.dat' into table ods.ods_trade_shops partition(dt = '2020-11-20');

    100050
    ,1,100225,WSxxx营超市,1,1,2020-06-28,2020-11-20 13:22:22 100052,2,100236,新鲜xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22 100053,3,100011,华为xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22 100054,4,100159,小米xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22 100055,5,100211,苹果xxx旗舰店,1,1,2020-06-28,2020-11-20 13:22:22

    2.初始化拉链表(2020-11-20)。ods_trade_shops => dim_trade_shops

    insert overwrite table dim.dim_trade_shops
    select shopid,userid,areaid,shopname,shoplevel,status,createtime,modifytime,
           dt as startdate,
           '9999-12-31' as enddate
       from ods.ods_trade_shops
    where dt = '2020-11-20';

    3.导入2020-11-21数据,构建拉链表

    
    
    load data local inpath '/root/data/shops/shop-2020-11-21.dat' into table ods.ods_trade_shops partition(dt = '2020-11-21');

    100057
    ,7,100311,三只xxx鼠零食,1,1,2020-06-28,2020-11-21 13:22:22
    100058,8,100329,良子xxx铺美食,1,1,2020-06-28,2020-11-21 13:22:22
    100054,4,100159,小米xxx旗舰店,2,1,2020-06-28,2020-11-21 13:22:22
    100055,5,100211,苹果xxx旗舰店,2,1,2020-06-28,2020-11-21 13:22:22

    4.构建拉链表(新增数据 + 历史数据)

    新增数据处理逻辑与家在历史数据相似,startdate=dt enddate = '9999-12-31'
    第一步:新增数据的处理办法

    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           dt as startdate,
    '9999-12-31' AS enddate
    from ods.ods_trade_shops
    where dt = '2020-11-21';

    第二步:历史数据的处理办法

    select b.shopid, b.userid ,b.areaid , b.shopname, b.shoplevel,b.status, b.createtime, b.modifytime, 
    b.startdate,
    CASE WHEN a.shopid is not null and b.enddate ='9999-12-31' THEN date_add('2020-11-21',-1) ELSE b.enddate
    end as enddate from (select *
    from ods.ods_trade_shops
    where dt='2020-11-21') a
    right join dim.dim_trade_shops b
    on a.shopid = b.shopid;

    综上,拉链表的最终处理(新增+历史数据)

    insert overwrite table dim.dim_trade_shops
    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           dt as startdate,
           '9999-12-31' AS enddate 
       from ods.ods_trade_shops
     where dt = '2020-11-21'
    union all
    select b.shopid, b.userid ,b.areaid , b.shopname, b.shoplevel,b.status, b.createtime, b.modifytime, b.startdate, CASE WHEN a.shopid is not null and b.enddate ='9999-12-31' THEN date_add('2020-11-21',-1) ELSE b.enddate end as enddate from (select * from ods.ods_trade_shops where dt='2020-11-21') a right join dim.dim_trade_shops b on a.shopid = b.shopid;

    处理拉链表的脚本:shopszipper.sh

    #!/bin/bash
    
    source /etc/profile
    if [ -n "$1" ] ;then
        do_date=$1
    else
        do_date=`date -d "-1 day" +%F`
    fi
    
    sql="
    insert overwrite table dim.dim_trade_shops
    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           dt as startdate,
           '9999-12-31' AS enddate 
       from ods.ods_trade_shops
     where dt = '$do_date'
    
    union all
    
    select b.shopid, b.userid ,b.areaid , b.shopname, b.shoplevel,b.status, b.createtime, b.modifytime, 
           b.startdate,
              CASE WHEN a.shopid is not null and b.enddate ='9999-12-31' 
                   THEN date_add('$do_date',-1) 
                   ELSE b.enddate 
              end as enddate
     from (select * 
              from ods.ods_trade_shops 
             where dt='$do_date') a 
    right join dim.dim_trade_shops b 
    on a.shopid = b.shopid;
    "
    
    hive -e "$sql"

    5.结果验证

    ods维表初始化(导入2020-11-20的数据)

    ods维表导入2020-11-21日的数据

    拉链表更新后(维表导入2020-11-21数据,并执行拉链表的操作后的结果)

     三、拉链表的回滚

    回滚逻辑:将拉链表恢复到rollback_date那一天的数据

    • end_date < rollback_date,即结束日期 < 回滚日期。表示该行数据在rollback_date 之前产生,这些数据需要原样保留
    • start_date <= rollback_date <= end_date,即开始日期 <= 回滚日期 <= 结束日期。这些数据是回滚日期之后产生的,但是需要修改。将end_date 改为 9999-12-31
    • 其他数据不用管

    例如,回滚到2020-11-21的数据

    1.处理end_date < rollback_date 的数据:保留

    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           startdate,
           enddate 
    1 as tag
    from dim.dim_trade_shops where enddate < '2020-11-21';

    2.处理start_date < rollback_date < end_date 的数据:设置enddate=9999-12-31

    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           startdate,
           enddate 
           2 as tag
       from dim.dim_trade_shops
     where startdate <= '2020-11-21' and '2020-11-21' <= enddate;

    3.将数据写入拉链表

    drop table tmp.test;
    create table tmp.test as
    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           startdate,
           enddate, 
           1 as tag
       from dim.dim_trade_shops
     where enddate < '2020-11-21'union all
    
    select shopid, userid, areaid, shopname, shoplevel, status, createtime, modifytime,
           startdate,
           enddate,
           2 as tag
       from dim.dim_trade_shops
     where startdate <= '2020-11-21' and '2020-11-21' <= enddate;

    4.结果验证

    回滚的表:

     原有的表:

     

    详解(2)

    2. 在会员分析中计算沉默会员数和流失会员数

    沉默会员的定义:只在安装当天启动过App,而且安装时间是在7天前

    select count(*) clientNum 
        from dws.dws_member_start_day 
        where dt <= date_add(current_date,-7) 
        and device_id in(
            select device_id 
                from (
                    select device_id, count(*) cnt 
                        from dws.dws_member_start_day 
                        group by device_id 
                        having cnt = 1
                    )tmp
            );

    流失会员的定义:最近30天未登录的会员

    select count(distinct device_id) lossNum 
        from dws.dws_member_start_day 
        where device_id not in (
            select distinct device_id 
            from dws.dws_member_start_day 
            where dt >=date_add(current_date,30) tem
        );

    详解(3)

    3. 在核心交易分析中完成如下指标的计算

    分析:以createTime来确定当前订单属于哪个季度,哪个月,哪个周,维护一张节假日,休息日,工作日的配置表,判断属于哪个特殊日期,计算orderId的数量,计算totalMoney的总和

    create table dwd.dwd_trade_orders(
    `orderId` int,
    `orderNo` string,
    `userId` bigint,
    `status` tinyint,
    `productMoney` decimal,
    `totalMoney` decimal,
    `payMethod` tinyint,
    `isPay` tinyint,
    `areaId` int,
    `tradeSrc` tinyint,
    `tradeType` int,
    `isRefund` tinyint,
    `dataFlag` tinyint,
    `createTime` string,
    `payTime` string,
    `modifiedTime` string,
    `start_date` string,
    `end_date` string
    ) COMMENT '订单事实拉链表'
    partitioned by (dt string)
    STORED AS PARQUET;
    create table special_day(
    `special_day_name` string,
    `start_date` string,
    `end_date` string
    ) COMMENT '订单事实拉链表'

        统计2020年每个季度的销售订单笔数、订单总额

    select (weekofyear(createTime)/3.1)+1,count(orderId),sum(totalMoney)
        from dwd_trade_orders
      group by (weekofyear(createTime)/3.1)+1
     where year(createTime)='2020'

        统计2020年每个月的销售订单笔数、订单总额

    select month(createTime),count(orderId),sum(totalMoney)
        from dwd_trade_orders
       group by month(createTime)
     where year(createTime)='2020'


        统计2020年每周(周一到周日)的销售订单笔数、订单总额

    select weekofyear(createTime),count(orderId),sum(totalMoney)
        from dwd_trade_orders
       group by weekofyear(createTime)
     where year(createTime)='2020'

        统计2020年国家法定节假日、休息日、工作日的订单笔数、订单总额

    select special_day_name,count(orderId),sum(totalMoney)
      from(
         select a.orderId,a.totalMoney,b.special_day_name
              from dwd_trade_orders a 
             right join special_day b 
            on b.start_date<=a.createTime<=end_date
            ) t1
      group by special_day_name
     where year(createTime)='2020';
  • 相关阅读:
    ubuntu在图形界面下打开一个终端
    [置顶] 屠夫与大夫
    service bound(一)
    Android Interface Definition Language (AIDL)
    service bound(二)
    移动应用开发原则
    Service bound(三)
    Linux 安装SSH服务
    JDK中设计模式
    Bad Hair Day【单调栈】
  • 原文地址:https://www.cnblogs.com/aloneme/p/15098777.html
Copyright © 2020-2023  润新知