• MYSQL/HIVESQL笔试题(三):HIVESQL(三)


    4 手写HQL 4

    已知一个表STG.ORDER,有如下字段:DateOrder_idUser_idamount。请给出sql进行统计:数据样例:2017-01-01,10029028,1000003251,33.57

    1)给出 2017年每个月的订单数、用户数、总成交金额。

    2)给出201711月的新客数(指在11月才有第一笔订单)

    建表

    create table order_tab(dt string,order_id string,user_id string,amount decimal(10,2)) row format delimited fields terminated by '	';

    1)给出 2017年每个月的订单数、用户数、总成交金额。

    select
       date_format(dt,'yyyy-MM'),
       count(order_id),
       count(distinct user_id),
       sum(amount)
    from
       order_tab
    where
       date_format(dt,'yyyy')='2017'
    group by
       date_format(dt,'yyyy-MM');

    2)给出201711月的新客数(指在11月才有第一笔订单)

    select
       count(user_id)
    from
       order_tab
    group by
       user_id
    having
       date_format(min(dt),'yyyy-MM')='2017-11';

    5 手写HQL 5

    有日志如下,请写出代码求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)日期 用户 年龄

    数据集

    2019-02-11,test_1,23
    2019-02-11,test_2,19
    2019-02-11,test_3,39
    2019-02-11,test_1,23
    2019-02-11,test_3,39
    2019-02-11,test_1,23
    2019-02-12,test_2,19
    2019-02-13,test_1,23
    2019-02-15,test_2,19
    2019-02-16,test_2,19

    1)建表

    create table user_age(dt string,user_id string,age int)row format delimited fields terminated by ',';

    2)按照日期以及用户分组,按照日期排序并给出排名

    select
       dt,
       user_id,
       min(age) age,
       rank() over(partition by user_id order by dt) rk
    from
       user_age
    group by
       dt,user_id;t1

    3)计算日期及排名的差值

    select
       user_id,
       age,
       date_sub(dt,rk) flag
    from
       t1;t2

    4)过滤出差值大于等于2的,即为连续两天活跃的用户

    select
       user_id,
       min(age) age
    from
       t2
    group by
       user_id,flag
    having
       count(*)>=2;t3

    5)对数据进行去重处理(一个用户可以在两个不同的时间点连续登录),例如:a用户在110111号以及120号和1214天登录。

    select
       user_id,
       min(age) age
    from
       t3
    group by
       user_id;t4

    6)计算活跃用户(两天连续有访问)的人数以及平均年龄

    select
       count(*) ct,
       cast(sum(age)/count(*) as decimal(10,2))
    from t4;

    7)对全量数据集进行按照用户去重

    select
       user_id,
       min(age) age 
    from
       user_age 
    group by 
       user_id;t5

    8)计算所有用户的数量以及平均年龄

    select
       count(*) user_count,
       cast((sum(age)/count(*)) as decimal(10,1)) 
    from 
       t5;

    9)将第5步以及第7步两个数据集进行union all操作

    select
       0 user_total_count,
       0 user_total_avg_age,
       count(*) twice_count,
       cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age
    from 
    (
       select
       user_id,
       min(age) age
    from
       (select
       user_id,
       min(age) age
    from
       (
        select
       user_id,
       age,
       date_sub(dt,rk) flag
    from
       (
        select
           dt,
           user_id,
           min(age) age,
           rank() over(partition by user_id order by dt) rk
        from
           user_age
        group by
           dt,user_id
        )t1
        )t2
    group by
       user_id,flag
    having
       count(*)>=2)t3
    group by
       user_id 
    )t4
    
    union all
    
    select
       count(*) user_total_count,
       cast((sum(age)/count(*)) as decimal(10,1)),
       0 twice_count,
       0 twice_count_avg_age
    from 
       (
          select
             user_id,
             min(age) age 
          from 
             user_age 
          group by 
             user_id
       )t5;t6

    10)求和并拼接为最终SQL

    select 
        sum(user_total_count),
        sum(user_total_avg_age),
        sum(twice_count),
        sum(twice_count_avg_age)
    from 
    (select
       0 user_total_count,
       0 user_total_avg_age,
       count(*) twice_count,
       cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age
    from 
    (
       select
       user_id,
       min(age) age
    from
       (select
       user_id,
       min(age) age
    from
       (
        select
       user_id,
       age,
       date_sub(dt,rk) flag
    from
       (
        select
           dt,
           user_id,
           min(age) age,
           rank() over(partition by user_id order by dt) rk
        from
           user_age
        group by
           dt,user_id
        )t1
        )t2
    group by
       user_id,flag
    having
       count(*)>=2)t3
    group by
       user_id 
    )t4
    
    union all
    
    select
       count(*) user_total_count,
       cast((sum(age)/count(*)) as decimal(10,1)),
       0 twice_count,
       0 twice_count_avg_age
    from 
       (
          select
             user_id,
             min(age) age 
          from 
             user_age 
          group by 
             user_id
       )t5)t6;

    6 手写HQL 6

    请用sql写出所有用户中在今年10月份第一次购买商品的金额,表ordertable字段(购买用户:userid,金额:money,购买时间:paymenttime(格式:2017-10-01),订单idorderid

    1)建表

    create table ordertable(
        userid string,
        money int,
        paymenttime string,
        orderid string)
    row format delimited fields terminated by '	';

    2)查询出

    select
       userid,
       min(paymenttime) paymenttime
    from
       ordertable
    where
       date_format(paymenttime,'yyyy-MM')='2017-10'
    group by
       userid;t1
    select
       t1.userid,
       t1.paymenttime,
       od.money
    from
       t1
    join
       ordertable od
    on
       t1.userid=od.userid
       and
       t1.paymenttime=od.paymenttime;
    select
       t1.userid,
       t1.paymenttime,
       od.money
    from
       (select
       userid,
       min(paymenttime) paymenttime
    from
       ordertable
    where
       date_format(paymenttime,'yyyy-MM')='2017-10'
    group by
       userid)t1
    join
       ordertable od
    on
       t1.userid=od.userid
       and
       t1.paymenttime=od.paymenttime;
  • 相关阅读:
    课程总结1
    网站概要设计说明书
    数据库设计说明书
    团队项目之7天工作计划
    NABC
    敏捷开发综述
    二维数组最大子数组
    电梯调度
    输出整数数组中 最大的子数组的结果
    【自习任我行】第二阶段个人总结10
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14879214.html
Copyright © 2020-2023  润新知