hive 2.1.1 DML操作
将文件加载到hive表
//官方指导 LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
//示例 LOAD DATA LOCAL INPATH '/root/a.txt' OVERWRITE INTO TABLE testdb.test partition(dd='a',ed='2018');
说明:local 如果没有指定local关键字,则filepath引用表所在的文件系统中的文件。
查询数据到hive表
//官方指导 Standard syntax: INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; Hive extension (dynamic partition inserts): INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
//示例 insert into table test2 partition(area,year) select id,ad,bd,cd,dd,ed from test where ed='2019';
说明:后面的字段对应的是分区字段
into是拼接新的值
overwrite是新值覆盖掉全部旧的值,又分区会覆盖指定分区内的值
表中写入到文件系统
//官方指导 Standard syntax: INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0) SELECT ... FROM ... Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
//示例 insert overwrite local directory '/root' row format delimited fields terminated by ' ' collection items terminated by ',' map keys terminated by ':' lines terminated by ' ' select * from test;
说明:如果不指定字段分割方式 将以 ^ A ^ B ^C 分割
修改表数据
//官方指导 更新只支持在ACID表上执行。 客户端 hive.support.concurrency – true hive.enforce.bucketing – true (Not required as of Hive 2.0) hive.exec.dynamic.partition.mode – nonstrict hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 服务端 hive.compactor.initiator.on – true (See table below for more details) hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service
Standard Syntax: UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
//示例 //服务端 set hive.compactor.initiator.on = true ; set hive.compactor.worker.threads = 1; //客户端 set hive.support.concurrency = true; set hive.enforce.bucketing = true; set hive.exec.dynamic.partition.mode = nonstrict; set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; update testdb.test set ad = 'xiaoming' where id =1 ;
说明:创建的表要为分桶表,还要设置参数为 'transactional'='true'
不支持子查询
当where条件匹配上了才能更新
分区,分桶的列不能更新
设置hive.optimize.sort.dynamic.partition=false可以提高效率
属性都可以在hive-site.xml文件中配置
删除表数据
//官方指导 只支持在ACID表上执行。 客户端 hive.support.concurrency – true hive.enforce.bucketing – true (Not required as of Hive 2.0) hive.exec.dynamic.partition.mode – nonstrict hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 服务端 hive.compactor.initiator.on – true (See table below for more details) hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service Standard Syntax: DELETE FROM tablename [WHERE expression]
//示例 //服务端 set hive.compactor.initiator.on = true ; set hive.compactor.worker.threads = 1; set hive.optimize.sort.dynamic.partition = false; //客户端 set hive.support.concurrency = true; set hive.enforce.bucketing = true; set hive.exec.dynamic.partition.mode = nonstrict; set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
delete from testdb.test where id = 1;
说明:见修改表数据说明
从hive2.2开始支持表合并