官方介绍
Hive执行流程图:
【Pratical Hive.pdf】学习笔记,各章节做主线辅以官网资料整理完成。
组件架构
-
客户端组件
Hive-cli,
JDBC/ODBC
Toad or SQuirreL -
HCatalog
元数据管理组件,主要作用如下
官方介绍
• Provides a common schema environment for multiple tools
• Allows for connectors to tools to read data from and write data to Hive’s warehouse
• Lets users share data across tools
• Creates a relational structure to Hadoop data
• Abstracts away the how and where of data storage
• Hides schema and storage changes from users -
hiveServer2
接口服务组件 -
Execution-Engine
-
MR
执行引擎组件 -
Tez
执行引擎组件,省略shuffle过程
Tez avoids disk IO by avoiding expensive shuffle and shorts while leveraging more efficient map side joins. Tez also utilizes a costbased optimizer, which helps produce faster execution plans. Combine this with the ORC file format geared
toward SQL performance and you have a query engine performing up to 100x faster than native MapReduce– -
Hive-on-Spark
-
Storage: Hadoop
基于hdfs文件存储