• Hive 基本语法操练(二):视图和索引操作


    1. 视图操作
    -------
    
    1) 创建一个测试表。
    
    ```
    hive> create table test(id int,name string);
    OK
    Time taken: 0.385 seconds
    hive> desc test;
    OK
    id                      int                                         
    name                    string                                      
    Time taken: 0.261 seconds, Fetched: 2 row(s)
    ```
    
    2) 基于表 test 创建一个 test_view 视图。
    
    ```
    hive> create view test_view(id,name_length) as select id,length(name) from test;
    ```
    
    3) 查看 test_view 视图属性。
    
    ```
    hive> desc test_view;
    OK
    id                  	int                 	                    
    name_length         	int                 	                    
    Time taken: 0.071 seconds, Fetched: 2 row(s)
    
    ```
    
    4) 查看视图结果。
    
    ```
    hive> select * from test_view;
    ```
    
    2. 索引操作
    -------
    1) Hive 创建索引。
    
    ```
    hive> create table user like group_test;
    OK
    Time taken: 0.232 seconds
    hive> create index user_index on table user(uid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild IN TABLE user_index_table;
    OK
    Time taken: 0.183 seconds
    
    ```
    
    2) 更新数据。
    
    ```
    hive> alter index user_index on user rebuild;
    Query ID = hadoop_20180518043232_ebdf97bd-5984-4310-a3c8-6befef328133
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1526553207632_0018, Tracking URL = http://master:8088/proxy/application_1526553207632_0018/
    Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0018
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-05-18 04:32:55,632 Stage-1 map = 0%,  reduce = 0%
    2018-05-18 04:33:04,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.65 sec
    2018-05-18 04:33:12,406 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.93 sec
    MapReduce Total cumulative CPU time: 2 seconds 930 msec
    Ended Job = job_1526553207632_0018
    Loading data to table default.user_index_table
    Table default.user_index_table stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.93 sec   HDFS Read: 290 HDFS Write: 50 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 930 msec
    OK
    Time taken: 25.609 seconds
    
    ```
    3) 查看索引
    ```
    hive> show index on user;
    OK
    user_index          	user                	uid                 	user_index_table    	compact             	
    Time taken: 0.046 seconds, Fetched: 1 row(s)
    
    ```
    
    4) 删除索引
    
    ```
    hive> drop index user_index on user;                                                                                                                         OK   
    Time taken: 0.094 seconds
    hive> show index on user;           
    OK
    Time taken: 0.036 seconds
    
    ```
    
    5) 创建表和索引案例
    
    ```
    hive> create table index_test(id INT,name STRING) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    ```
    
    创建一个索引测试表 index_test,dt作为分区属性,“ROW FORMAT DELIMITED FIELDS TERMINATED BY ','” 表示用逗号分割字符串。
    
    6) 创建一个临时索引表 index_tmp。
    
    ```
    hive> create table index_tmp(id INT,name STRING,dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    ```
    
    7) 加载本地数据到 index_tmp 表中。
    
    ```
    [hadoop@master test]$ sudo vim test.txt
    02,female,192.168.1.3
    01,male,192.168.1.26
    03,male,192.168.1.5
    08,female,192.168.1.62
    04,male,192.168.1.9
    hive> load data local inpath '/home/hadoop/test/test.txt' into table index_tmp;
    Loading data to table default.index_tmp
    Table default.index_tmp stats: [numFiles=1, totalSize=106]
    OK
    Time taken: 0.224 seconds
    hive> select * from index_tmp;
    OK
    2	female	192.168.1.3
    1	male	192.168.1.26
    3	male	192.168.1.5
    8	female	192.168.1.62
    4	male	192.168.1.9
    
    ```
    
    设置 Hive 的索引属性来优化索引查询,命令如下。
    
    ```
    hive> set hive.exec.dynamic.partition.mode=nonstrict;----设置所有列为 dynamic partition
    hive> set hive.exec.dynamic.partition=true;----使用动态分区
    ```
    
    8) 查询index_tmp 表中的数据,插入 table_test 表中。
    
    ```
    hive> insert overwrite table index_test partition(dt) select id,name,dt from index_tmp;
    Query ID = hadoop_20180518044343_97e7fe67-a5a1-408b-be8e-e9dadb2f9e48
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1526553207632_0019, Tracking URL = http://master:8088/proxy/application_1526553207632_0019/
    Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0019
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2018-05-18 04:43:42,621 Stage-1 map = 0%,  reduce = 0%
    2018-05-18 04:43:48,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
    MapReduce Total cumulative CPU time: 870 msec
    Ended Job = job_1526553207632_0019
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://ns/tmp/hive/hadoop/9f7dd0d3-a14c-4535-9291-557b9cb6259b/hive_2018-05-18_04-43-36_337_559705388802402645-1/-ext-10000
    Loading data to table default.index_test partition (dt=null)
    	 Time taken for load dynamic partitions : 278
    	Loading partition {dt=192.168.1.62}
    	Loading partition {dt=192.168.1.3}
    	Loading partition {dt=192.168.1.5}
    	Loading partition {dt=192.168.1.26}
    	Loading partition {dt=192.168.1.9}
    	 Time taken for adding to write entity : 0
    Partition default.index_test{dt=192.168.1.26} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
    Partition default.index_test{dt=192.168.1.3} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
    Partition default.index_test{dt=192.168.1.5} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
    Partition default.index_test{dt=192.168.1.62} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
    Partition default.index_test{dt=192.168.1.9} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 308 HDFS Write: 338 SUCCESS
    Total MapReduce CPU Time Spent: 870 msec
    OK
    Time taken: 15.225 seconds
    
    ```
    
    9) 使用 index_test 表,在属性 id 上创建一个索引 index1_index_test 。
    
    ```
    hive> create index index1_index_test on table index_test(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; 
    OK
    Time taken: 0.109 seconds
    
    ```
    
    10) 填充索引数据。
    
    ```
    hive> alter index index1_index_test on index_test rebuild;
    Query ID = hadoop_20180518044545_edc98c3b-03eb-42c0-83d1-ba266d8497e0
    Total jobs = 5
    (省略MapReduce过程)
    Ended Job = job_1526553207632_0024
    Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.26)
    Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.3)
    Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.5)
    Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.62)
    Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.9)
    Partition default.default__index_test_index1_index_test__{dt=192.168.1.26} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]
    Partition default.default__index_test_index1_index_test__{dt=192.168.1.3} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]
    Partition default.default__index_test_index1_index_test__{dt=192.168.1.5} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]
    Partition default.default__index_test_index1_index_test__{dt=192.168.1.62} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]
    Partition default.default__index_test_index1_index_test__{dt=192.168.1.9} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.66 sec   HDFS Read: 226 HDFS Write: 191 SUCCESS
    Stage-Stage-5: Map: 1  Reduce: 1   Cumulative CPU: 1.94 sec   HDFS Read: 227 HDFS Write: 187 SUCCESS
    Stage-Stage-9: Map: 1  Reduce: 1   Cumulative CPU: 2.15 sec   HDFS Read: 225 HDFS Write: 187 SUCCESS
    Stage-Stage-13: Map: 1  Reduce: 1   Cumulative CPU: 1.9 sec   HDFS Read: 228 HDFS Write: 191 SUCCESS
    Stage-Stage-17: Map: 1  Reduce: 1   Cumulative CPU: 2.73 sec   HDFS Read: 225 HDFS Write: 187 SUCCESS
    Total MapReduce CPU Time Spent: 11 seconds 380 msec
    OK
    Time taken: 196.607 seconds
    ```
    
    11) 查看创建的索引。
    
    ```
    hive> show index on index_test
    ```
    
    12) 查看分区信息。
    
    ```
    hive> show partitions index_test;
    ```
    
    
    13) 查看索引数据。
    
    ```
    [hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__
    18/05/18 04:52:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 5 items
    drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.26
    drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.3
    drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.5
    drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.62
    drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.9
    ```
    
    14) 删除索引。
    
    ```
    hive> drop index index1_index_test on index_test;
    OK
    Time taken: 4.842 seconds
    hive> show index on index_test;
    OK
    Time taken: 0.031 seconds
    
    ```
    
    15) 索引数据也被删除。
    
    ```
    [hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__
    18/05/18 04:53:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    ls: `/user/hive/warehouse/default__index_test_index1_index_test__': No such file or directory
    
    ```
    
    &16) 修改配置文件信息。
    
    ```
    [hadoop@master hadoop-2.6.0]$ cd /opt/modules/hive1.0.0/conf/
    [hadoop@master conf]$ sudo vim hive-site.xml(/+字符串快速查找)
    <property>
    	<name>hive.optimize.index.filter</name>
    	<value>true</value>
    </property>
    <property>
    	<name>hive.optimize.index.groupby</name>
    	<value>true< /value>
    </property>
    <property>
    	<name>hive.optimize.index.filter.compact.minsize</name>
    	<value>5120</value>
    </property>
    
    ```
    hive.optimize.index.filter 和 hive.optimize.index.groupby 参数默认是 false。使用索引的时候必须把这两个参数开启,才能起到作用。
    
    hive.optimize.index.filter.compact.minsize 参数为输入一个紧凑的索引将被自动采用最小尺寸、默认5368709120(以字节为单位)。
    
    以上就是博主为大家介绍的这一板块的主要内容,这都是博主自己的学习过程,希望能给大家带来一定的指导作用,有用的还望大家点个支持,如果对你没用也望包涵,有错误烦请指出。如有期待可关注博主以第一时间获取更新哦,谢谢! 
    
     版权声明:本文为博主原创文章,未经博主允许不得转载。
  • 相关阅读:
    awk 使用shell 变量
    设计模式之 外观(门面)模式 Facade
    设计模式之 抽象工厂模式
    python 第一课
    Visual Basic 图片连接地址添加
    smarty 不同模板 缓存时间
    PHP 传参过滤
    Nginx 0.7.x + PHP 5.2.10(FastCGI)搭建支持高并发量的Web服务器
    linux vi 编辑命令
    PHP 命令模式 执行文件 并传递参数
  • 原文地址:https://www.cnblogs.com/zimo-jing/p/9059550.html
Copyright © 2020-2023  润新知