• 面试用神sql--套路--累计报表


    create table t_access_times(username string,month string,salary int)
    row format delimited fields terminated by ',';

    load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;

    A,2015-01,5
    A,2015-01,15
    B,2015-01,5
    A,2015-01,8
    B,2015-01,25
    A,2015-01,5
    A,2015-02,4
    A,2015-02,6
    B,2015-02,10
    B,2015-02,5


    1、第一步,先求个用户的月总金额
    select username,month,sum(salary) as salary from t_access_times group by username,month

    +-----------+----------+---------+--+
    | username | month | salary |
    +-----------+----------+---------+--+
    | A | 2015-01 | 33 |
    | A | 2015-02 | 10 |
    | B | 2015-01 | 30 |
    | B | 2015-02 | 15 |
    +-----------+----------+---------+--+

    2、第二步,将月总金额表 自己连接 自己连接
    +-------------+----------+-----------+-------------+----------+-----------+--+
    | a.username | a.month | a.salary | b.username | b.month | b.salary |
    +-------------+----------+-----------+-------------+----------+-----------+--+
    | A | 2015-01 | 33 | A | 2015-01 | 33 |
    | A | 2015-01 | 33 | A | 2015-02 | 10 |
    | A | 2015-02 | 10 | A | 2015-01 | 33 |
    | A | 2015-02 | 10 | A | 2015-02 | 10 |
    | B | 2015-01 | 30 | B | 2015-01 | 30 |
    | B | 2015-01 | 30 | B | 2015-02 | 15 |
    | B | 2015-02 | 15 | B | 2015-01 | 30 |
    | B | 2015-02 | 15 | B | 2015-02 | 15 |
    +-------------+----------+-----------+-------------+----------+-----------+--+

    3、第三步,从上一步的结果中
    进行分组查询,分组的字段是a.username a.month
    求月累计值: 将b.month <= a.month的所有b.salary求和即可
    select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
    from
    (select username,month,sum(salary) as salary from t_access_times group by username,month) A
    inner join
    (select username,month,sum(salary) as salary from t_access_times group by username,month) B
    on
    A.username=B.username
    where B.month <= A.month
    group by A.username,A.month
    order by A.username,A.month;

    练习-练习题答案

    查询全体学生的学号与姓名
      hive> select Sno,Sname from student;

    查询选修了课程的学生姓名
      hive> select distinct Sname from student inner join sc on student.Sno=Sc.Sno;

    ----hive的group by 和集合函数

    查询学生的总人数
      hive> select count(distinct Sno)count from student;

    计算1号课程的学生平均成绩
      hive> select avg(distinct Grade) from sc where Cno=1;
    查询各科成绩平均分
    hive> select Cno,avg(Grade) from sc group by Cno;
    查询选修1号课程的学生最高分数
      select Grade from sc where Cno=1 sort by Grade desc limit 1;
    (注意比较:select * from sc where Cno=1 sort by Grade
    select Grade from sc where Cno=1 order by Grade)
      
      
    求各个课程号及相应的选课人数
      hive> select Cno,count(1) from sc group by Cno;


    查询选修了3门以上的课程的学生学号
      hive> select Sno from (select Sno,count(Cno) CountCno from sc group by Sno)a where a.CountCno>3;
    或 hive> select Sno from sc group by Sno having count(Cno)>3;

    ----hive的Order By/Sort By/Distribute By
      Order By ,在strict 模式下(hive.mapred.mode=strict),order by 语句必须跟着limit语句,但是在nonstrict下就不是必须的,这样做的理由是必须有一个reduce对最终的结果进行排序,如果最后输出的行数过多,一个reduce需要花费很长的时间。

    查询学生信息,结果按学号全局有序
      hive> set hive.mapred.mode=strict; <默认nonstrict>
    hive> select Sno from student order by Sno;
    FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'Sno'
      Sort By,它通常发生在每一个redcue里,“order by” 和“sort by"的区别在于,前者能给保证输出都是有顺序的,而后者如果有多个reduce的时候只是保证了输出的部分有序。set mapred.reduce.tasks=<number>在sort by可以指定,在用sort by的时候,如果没有指定列,它会随机的分配到不同的reduce里去。distribute by 按照指定的字段对数据进行划分到不同的输出reduce中
      此方法会根据性别划分到不同的reduce中 ,然后按年龄排序并输出到不同的文件中。

    查询学生信息,按性别分区,在分区内按年龄有序
      hive> set mapred.reduce.tasks=2;
      hive> insert overwrite local directory '/home/hadoop/out'
    select * from student distribute by Sex sort by Sage;

    ----Join查询,join只支持等值连接
    查询每个学生及其选修课程的情况
      hive> select student.*,sc.* from student join sc on (student.Sno =sc.Sno);
    查询学生的得分情况。
      hive>select student.Sname,course.Cname,sc.Grade from student join sc on student.Sno=sc.Sno join course on sc.cno=course.cno;

    查询选修2号课程且成绩在90分以上的所有学生。
      hive> select student.Sname,sc.Grade from student join sc on student.Sno=sc.Sno
    where sc.Cno=2 and sc.Grade>90;
      
    ----LEFT,RIGHT 和 FULL OUTER JOIN ,inner join, left semi join
    查询所有学生的信息,如果在成绩表中有成绩,则输出成绩表中的课程号
      hive> select student.Sname,sc.Cno from student left outer join sc on student.Sno=sc.Sno;
      如果student的sno值对应的sc在中没有值,则会输出student.Sname null.如果用right out join会保留右边的值,左边的为null。
      Join 发生在WHERE 子句之前。如果你想限制 join 的输出,应该在 WHERE 子句中写过滤条件——或是在join 子句中写。
      
    ----LEFT SEMI JOIN Hive 当前没有实现 IN/EXISTS 子查询,可以用 LEFT SEMI JOIN 重写子查询语句

    重写以下子查询为LEFT SEMI JOIN
    SELECT a.key, a.value
    FROM a
    WHERE a.key exist in
    (SELECT b.key
    FROM B);
    可以被重写为:
    SELECT a.key, a.val
    FROM a LEFT SEMI JOIN b on (a.key = b.key)

    查询与“刘晨”在同一个系学习的学生
      hive> select s1.Sname from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';

    注意比较:
    select * from student s1 left join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
    select * from student s1 right join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
    select * from student s1 inner join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
    select * from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
    select s1.Sname from student s1 right semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';

     

     

  • 相关阅读:
    js打印的两种方法
    C# VS2010中,用微软自带的System.Data.OracleClient来连接Oracle数据库
    js获取本月第几周和本年第几周
    [转]优化数据库大幅度提高Oracle的性能
    sass的循环for,while,each
    sass的mixin,extend,placeholder,function
    float,absolute脱离文档流的总结
    React ref的用法
    cloneNode与事件拷贝
    mobx动态添加observable
  • 原文地址:https://www.cnblogs.com/shan13936/p/13765011.html
Copyright © 2020-2023  润新知