HQL练习

HQL练习
Hive学习笔记总结

05. Hql练习

1. hql基础练习

题目和数据来源：http://www.w2b-c.com/article/150326(去掉-)

create和load
```
create table students(Sno int,Sname string,Sex string,Sage int,Sdept string)row format delimited fields terminated by ','stored as textfile;
create table course(Cno int,Cname string) row format delimited fields terminated by ',' stored as textfile;
create table sc(Sno int,Cno int,Grade int)row format delimited fields terminated by ',' stored as textfile;

load data local inpath '/home/hadoop/hivedata/students.txt' overwrite into table student;
load data local inpath '/home/hadoop/hivedata/sc.txt' overwrite into table sc;
load data local inpath '/home/hadoop/hivedata/course.txt' overwrite into table course;
```
1.查询全体学生的学号与姓名
```
hive> select Sno,Sname from students;
```
2.查询选修了课程的学生姓名
```
select distinct Sname from students, sc where students.Sno = sc.Sno;
```
或：
```
select distinct Sname from students inner join sc on students.Sno = sc.Sno;
```
3.查询学生的总人数
```
select count(*) from students;
```
4.计算1号课程的学生平均成绩
```
select avg(Grade) from sc where Cno = 1;
```
5.查询各科成绩平均分
```
select Cname,avg(Grade) from sc, course where sc.Cno = course.Cno group by sc.Cno;
```
//Grade要么出现在group关键词之后，要么使用聚合函数。

6.查询选修1号课程的学生最高分数
```
select max(Grade) from sc where Cno = 1;
```
7.求各个课程号及相应的选课人数
```
select Cno,count(*) from sc group by Cno;
```
8.查询选修了3门以上的课程的学生学号
```
select Sno from sc group by Sno having count(Cno) >3 ;
```
9.查询学生信息，结果按学号全局有序
```
select * from students order by Sno;
```
10.查询学生信息，结果区分性别按年龄有序
```
set mapred.reduce.tasks=2;
select * from students distribute by sex sort by sage;
```
11.查询每个学生及其选修课程的情况
```
select students.*,sc.* from students join sc on (students.Sno =sc.Sno);
```
12.查询学生的得分情况
13.查询选修2号课程且成绩在90分以上的所有学生。
```
select students.Sname from sc,students where sc.Cno = 2 and sc.Grade > 90 and sc.Sno = students.Sno;
```
或者：
```
select students.Sname,sc.Grade from students join sc on students.Sno=sc.Sno where  sc.Cno=2 and sc.Grade>90;
```
14.查询所有学生的信息，如果在成绩表中有成绩，则输出成绩表中的课程号
```
select students.Sname,sc.Cno from students join sc on students.Sno=sc.Sno;
```
15.重写以下子查询为LEFT SEMI JOIN
```
SELECT a.key, a.value FROM a WHERE a.key exist in (SELECT b.key FROM B);
```
查询目的：查找A中，key值在B中存在的数据。
可以被重写为：
```
select a.key,a.value from a left semi join b on a.key = b.key;
```
16.查询与“刘晨”在同一个系学习的学生
```
select s1.Sname from students s1 where sdept in (select sdept from students where sname = '刘晨');
```
或者：
```
select s1.Sname from students s1 left semi join students s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
```
注意比较：
```
select * from students s1 left join students s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
select * from students s1 right join students s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
select * from students s1 inner join students s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
select * from students s1 left semi join students s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
```
2. 执行顺序

标准顺序：
select--from--where--group by--having--order by

join操作中，on条件与where条件的区别

数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。
join发生在where字句之前，在使用left jion时，on和where条件的区别如下：

1、on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左边表中的记录。（右边置为Null了）

2、where条件是在临时表生成好后，再对临时表进行过滤的条件。这时已经没有left join的含义（必须返回左边表的记录）了，条件不为真的就全部过滤掉。

假设有两张表：

表1：tab1
```
id size 
1  10 
2  20 
3  30 
```
表2：tab2
```
size name 
10   AAA 
20   BBB 
20   CCC 
```
两条SQL:
```
1、select * from tab1 left join tab2 on tab1.size = tab2.size where tab2.name='AAA'
2、select * from tab1 left join tab2 on tab1.size = tab2.size and tab2.name='AAA'
```
第一条SQL的过程：
1、中间表
on条件:
```
tab1.size = tab2.size 
tab1.id tab1.size tab2.size tab2.name 
1 10 10 AAA 
2 20 20 BBB 
2 20 20 CCC 
3 30 (null) (null) 
```
2、再对中间表过滤
where 条件：
```
tab2.name='AAA'
tab1.id tab1.size tab2.size tab2.name 
1 10 10 AAA 
```
第二条SQL的过程：
1、中间表
on条件:
```
tab1.size = tab2.size and tab2.name='AAA'
(条件不为真也会返回左表中的记录) tab1.id tab1.size tab2.size tab2.name 
1 10 10 AAA 
2 20 (null) (null) 
3 30 (null) (null) 
```
其实以上结果的关键原因就是left join,right join,full join的特殊性，
不管on上的条件是否为真都会返回left或right表中的记录，full则具有left和right的特性的并集。

** 而inner join没这个特殊性，则条件放在on中和where中，返回的结果集是相同的。**

3. Hive实战--级联求和（累计报表）

需求：
有如下访客访问次数统计表 t_access_times

访客月份访问次数

A 2015-01 5

A 2015-01 15

B 2015-01 5

A 2015-01 8

B 2015-01 25

A 2015-01 5

A 2015-02 4

A 2015-02 6

B 2015-02 10

B 2015-02 5

需要输出报表：t_access_times_accumulate
月访问：当月的总次数；累计访问总计：截止到当月的月访问次数之和。

访客月份月访问总计累计访问总计

A 2015-01 33 33

A 2015-02 10 43

B 2015-01 30 30

B 2015-02 15 45

准备数据：
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
```
create table t_access_time(username string,month string,salary int)
row format delimited fields terminated by ',';

load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_time;
```
1、第一步，先求每个用户的月总金额
```
select username,month,sum(salary) from t_access_time group by username,month;
```
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+

2、第二步，将月总金额表自己连接(自连接)
```
select * from 
(select username,month,sum(salary) as salary from t_access_time group by username,month) TabA 
inner join 
(select username,month,sum(salary) as salary from t_access_time group by username,month) TabB 
on TabA.username = TabB.username;
```
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+

3、第三步，从上一步的结果中
进行分组查询，分组的字段是a.username a.month
求月累计值：将b.month <= a.month的所有b.salary求和即可
```
select TabA.username,TabA.month,max(TabA.salary) as month_salary,sum(TabB.salary) as sum_salary 
from 
(select username,month,sum(salary) as salary from t_access_time group by username,month) TabA 
inner join 
(select username,month,sum(salary) as salary from t_access_time group by username,month) TabB 
on TabA.username = TabB.username 
where TabB.month<= TabA.month 
group by TabA.username,TabA.month;
```
max(TabA.salary)不能直接写成TabA.salary，因为这个字段没有出现在group by中，也没有聚合函数，所以使用max表示。

结果：
A 2015-01 33 33
A 2015-02 10 43
B 2015-01 30 30
B 2015-02 15 45

参考http://www.w2b-c.com/article/150326（去掉-）

初接触，记下学习笔记，还有很多问题，望指导，谢谢。
相关阅读:
python CreateUniqueName()创建唯一的名字
 node 创建静态服务器并自动打开浏览器
 基于jQuery 的插件开发
 Fetch
纯css 来实现下拉菜单
 javascript模板引擎之
 jquery jsonp 跨域
 数据库增删改查
 Promise
Vue.js
原文地址：https://www.cnblogs.com/wangrd/p/6275604.html

访客	月份	访问次数
A	2015-01	5
A	2015-01	15
B	2015-01	5
A	2015-01	8
B	2015-01	25
A	2015-01	5
A	2015-02	4
A	2015-02	6
B	2015-02	10
B	2015-02	5

05. Hql练习

1. hql基础练习

2. 执行顺序

3. Hive实战--级联求和（累计报表）