Hive的数据存储格式

Hive的数据存储格式

1.默认存储格式为：纯文本

　　stored as textfile;

2.二进制存储的格式

　　顺序文件，avro文件，parquet文件，rcfile文件，orcfile文件。

3.转存parquet格式

　　hive>create table hive.stocks_parquet stored as parquet as select * from stocks;

　　　说明：原始数据大小为stocks表[40万条]，21M，转存parquet格式后，hdfs上数据文件大小为6M，压缩比在3倍左右；

4.转存rcfile

　　hive> create table hive.stocks_rcfile stored as rcfile as select * from stocks ;

　　　　说明：原始数据大小为stocks表[40万条]，21M，转存rcfile格式后，hdfs上数据文件大小为16M，压缩比在0.7倍左右；

5.转存orcfile

　　hive> create table hive.stocks_orcfile stored as orcfile as select * from stocks ;

　　　　说明：原始数据大小为stocks表[40万条]，21M，转存orcfile格式后，hdfs上数据文件大小为5M，压缩比在4倍左右；

6.测试执行时间
　　hive>select count(*) from stocks ;
　　　　执行时间：exec/fetch time: 0.227/1.580 sec
　　hive>select count(*) from hive.stocks_parquet ;
　　　　执行时间：exec/fetch time: 0.144/2.846 sec
　　hive>select count(*) from hive.stocks_rcfile ;
　　　　执行时间：exec/fetch time: 0.114/1.238 sec
　　hive>select count(*) from hive.stocks_orcfile ;
　　　　执行时间：exec/fetch time: 0.129/2.027 sec

UDF自定义函数
　　1.首先创建JAVA类，继承UDF.class
　　2.重写evaluate()方法；
　　3.打jar包；
　　4.加载自定义函数的jar包;
　　　　hive>add jar /home/hyxy/XXX.jar ;
　　　　hive>create temporary function {function_name} as 'com.hyxy.hive.udf.xxx'

　　5.自定义函数类型
　　　　a.UDF:单行进-->单行出
　　　　b.UDAF：多行进-->单行出
　　　　c.UDTF：单行进-->多行出
相关阅读:
穷举
 菱形
 docker安装cloudera manager，切换cloudera-scm用户报错can not open session
修改cloudera manager的端口号
 postgresql拓展if、ifnull、group_concat函数
 clion调试postgresql
Java面向切面原理与实践
 Spring-boot非Mock测试MVC，调试启动tomcat容器
 spring-cloud-feign 使用@RequetParam报错QueryMap parameter must be a Map: class java.lang.String
linux虚拟机拓展大小
原文地址：https://www.cnblogs.com/lyr999736/p/9474005.html

热门文章
CSS中class以及ID常规命名规则
 20150428 类
 HTML 第一讲
 SQL第一讲2
SQL第一讲
 结构体
 二维数组
 数组
 try catch
whlie 循环