Part 1 数据库建立
1、创建数据库 flight,并创建新表:on_time_performance
CREATE DATABASE flight;
CREATE TABLE on_time_performance;
2、航班数据检查csv ,导入到sql数据库中
使用Navicat进行可视化导入,导入时,要注意文本分隔符的选择
3、查看总行数
SELECT COUNT(*) FROM on_time_performance
结果如下:
Part 2 延误分析
-
在航班及时率表中,所有与延迟delay相关的字段如下,找出导致延迟次数最多的是哪一个(WeatherDelay CarrierDelay NASDelay SecurityDelay LateAircraftDelay)
- 首先分析表格中的数据
SELECT WeatherDelay,CarrierDelay,NASDelay,SecurityDelay,LateAircraftDelay FROM on_time_performance LIMIT 20
从数据可以看出,当航班没有延误时,所有列数据均为NULL;当有延误时,每一列的数字代表该原因导致的延误时长;0则表示该趟航班延误,但不是该原因导致的。
- 提取出每一列的延误原因,次数,延误时长总和,并按照次数排序(此处需用到组合查询)
将每一个原因的延误次数都计算出来,再组合到一起形成一个临时表,将临时表按照次数降序排序,并选择第一个,就是延误最多次数的
SELECT 'WeatherDelay', sum(case when WeatherDelay>0 then 1 END)as delaytime FROM on_time_performance UNION ALL SELECT 'CarrierDelay', sum(CASE WHEN CarrierDelay>0 THEN 1 END) FROM on_time_performance UNION ALL SELECT 'NASDelay', sum(CASE WHEN NASDelay>0 THEN 1 END) FROM on_time_performance UNION ALL SELECT 'SecurityDelay', sum(CASE WHEN SecurityDelay>0 THEN 1 END) FROM on_time_performance UNION ALL SELECT 'LateAircraftDelay', sum(CASE WHEN LateAircraftDelay>0 THEN 1 END) FROM on_time_performance ORDER BY delaytime desc limit 1;
-
统计各个原因累计的延迟时间
-
延迟时间倒数第二的原因及累计的延迟时间:
将上述的表进行降序排列,并使用limit选择第排名第二的
SELECT 'WeatherDelay', sum(case when WeatherDelay>0 then 1 END)as delaytime FROM on_time_performance
UNION ALL
SELECT 'CarrierDelay', sum(CASE WHEN CarrierDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'NASDelay', sum(CASE WHEN NASDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'SecurityDelay', sum(CASE WHEN SecurityDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'LateAircraftDelay', sum(CASE WHEN LateAircraftDelay>0 THEN 1 END) FROM on_time_performance
ORDER BY delaytime asc limit 1,1;
Part 3 建立索引
- 由于时间部分查询非常多,所以在时间字段上增加一个索引,以加快查询速度;
ALTER TABLE on_time_performance ADD INDEX idx_flight_date(`FlightDate`);
# 使用下列语句验证查询的速度是否有优化
SELECT AirlineID, FlightNum, Origin, Dest, DepTime, TailNum, AirTime, Distance
FROM on_time_performance WHERE `FlightDate` = '2015-01-03';
查询时间由0.326秒降低至0.004秒,提升明显
Part 4 航班的季节性探寻
1、探究一年内飞行航班的季节性
SELECT `Quarter`, count(*) AS QuarterFlightAmount FROM on_time_performance
GROUP BY `Quarter` WITH ROLLUP;
2、航班最多的月份,有多少个航班
SELECT `Month`,count(*)
FROM on_time_performance
GROUP BY `Month` ORDER BY count(*) desc limit 1;