hive数据导入导出
一、导入数据4种方式
建表语句
create table test( name string, friends array, children map<string, int>, address struct<street:string, city:string> ) row format delimited fields terminated by ',' collection items terminated by '_' map keys terminated by ':' lines terminated by ' '; row format delimited fields terminated by ',' – 列分隔符 collection items terminated by '_' --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号) map keys terminated by ':' – MAP中的key与value的分隔符 lines terminated by ' '; – 行分隔符
1、本地文件导入到Hive表,需提前创建表
row format delimited fields terminated by '|' lines terminated by ' ' stored as textfile; load data local inpath "/tmp/user/data/demo_local.parquet" into table db_tmp.demo_local;
2、HDFS文件导入到Hive表,需提前创建表
load data inpath "/tmp/user/data/demo_hdfs.parquet" into table db_tmp.demo_hdfs;
3、Hive表导入到Hive表
insert into table demo_hive select * from demo_hive_b;
4、创建表时从其他Hive表导入
create table demo_a as select * from demo_hive_b;
5、通过sqoop将mysql库导入到Hive表
-- 默认导入到default库 sqoop import --connect jdbc:mysql://127.0.0.1:3306/casedb --username root --password password --table demo --hive-import --create-hive-table -m 1 -- 指定导入的数据库 sqoop import --connect jdbc:mysql://127.0.0.1:3306/casedb --username root --password root --table demo --hive-import --create-hive-table --hive-table database.demo -m 1
二、导出数据
1、导出到本地
insert overwrite local directory "/home/hadoop/data/" row format dilimited fields terminated by "," select * from demo_hive_b;
2、导出到HDFS
insert overwrite directory "/home/hadoop/data/" row format dilimited fields terminated by "," select * from demo_hive_b;
3、Hive命令行导出
# Linux bash终端 # 重定向方式 hive -e "select * from demo_hive_b" >> /home/hadoop/data/demo_output.txt # sql文件方式 echo "select * from demo_hive_b" > /home/hadoop/data/demo_output.sql hive -f /home/hadoop/data/demo_output.sql >> /home/hadoop/data/demo_output.txt
4、导出文件时,会遇本来表中本来字段值为null的字段导出时为
insert overwrite directory '/data/files/map_table_4' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'=',', 'serialization.format'= '', 'serialization.null.format'='' ) STORED AS TEXTFILE select foo, null, bar from map_table;
三、备注
1)设置 alter table name SET SERDEPROPERTIES('serialization.null.format' = 'N');
则:底层数据保存的是'N',通过查询显示的是'NULL'
这时如果查询为空值的字段可通过 语句:a is null 或者 a='\N'
2)设置 alter tablename SET SERDEPROPERTIES('serialization.null.format' = 'NULL');
则:底层数据保存的是'NULL',通过查询显示的是'NULL'
这时如果查询为空值的字段可通过 语句:a is null 或者 a='NULL'
3)设置 alter tablename SET SERDEPROPERTIES('serialization.null.format' = '');
则:底层数据保存的是'',通过查询显示的是'NULL'
'' 与 length(xx)=0
'' 表示的是字段不为null且为空字符串,此时用 a is null 是无法查询这种值的,必须通过 a='' 或者 length(a)=0 查询。