• Hive与Hbase结合使用


    hive的启动需要使用到zookeeper, 所以, 要么自己搭建zookeeper, 要么跟其它东西一起使用, 我这里做的是跟hbase一起使用的zookeeper, 因为hbase自带zookeeper, hbase启动就会启动zookeeper, 而hive默认会连接本机的2181端口, 所以我这里选择在slaver3上使用hive.


    集群的搭建以及机器的分配见hadoop搭建: http://phey.cc/multinode_hadoop20.html
    以及hbase集群搭建http://phey.cc/Install_hbase_cluster.html


    解压hive包后拷贝环境变量模板到指定文件
    [cc@slaver3 ~]$ cp hive-0.12.0-cdh5.0.1/conf/hive-env.sh.template hive-0.12.0-cdh5.0.1/conf/hive-env.sh
    [cc@slaver3 ~]$ ▊




    编辑环境变量, 一个是hadoop的安装目录,一个是hbase的jar位置,如果hbase和hive的jar版本不对会报错
    [cc@slaver3 ~]$ vim hive-0.12.0-cdh5.0.1/conf/hive-env.sh
    export HADOOP_HOME=/home/cc/hadoop-2.3.0-cdh5.0.0
    export HIVE_AUX_JARS_PATH=/home/cc/hbase-0.96.1.1-cdh5.0.1/lib
    [cc@slaver3 ~]$ ▊




    启动hive,指定hbase的RPC端口
    [cc@slaver3 hive-0.12.0-cdh5.0.1]$ bin/hive -hiveconf hbase.master=master1:60000
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    14/08/21 21:34:07 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative


    Logging initialized using configuration in jar:file:/home/cc/hive-0.12.0-cdh5.0.1/lib/hive-common-0.12.0-cdh5.0.1.jar!/hive-log4j.properties
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/cc/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/cc/hive-0.12.0-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/cc/hbase-0.96.1.1-cdh5.0.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    hive> ▊




    在hive中建表。在hive新建一个名为cctabl的表,这个表映射到hbase的表名是cc,cctable表里面的int类型的key对应了cc表里面的row key,cctable里面类型是string的value对应了cc表里面的cf:val
    hive> CREATE TABLE cctable (key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:val") TBLPROPERTIES ("hbase.table.name" = "cc");
    OK
    Time taken: 9.302 seconds
    hive> ▊




    在hbase中可以看到结果, 创建了一个表
    hbase(main):011:0> list
    TABLE                                                                                                                                                                 
    cc                                                                                                                                                                    
    1 row(s) in 0.0250 seconds


    => ["cc"]
    hbase(main):012:0> describe 'cc'
    DESCRIPTION                                                                                                 ENABLED                                                   
     'cc', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIO true                                                      
     NS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false',                                                           
      BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                   
    1 row(s) in 0.0390 seconds


    hbase(main):013:0> ▊




    如果不希望hive去创建表而是使用hbase已经有的表, 那么创建表的时候加上external参数就可以了, 例如
    hive> CREATE EXTERNAL TABLE cctable (key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:val") TBLPROPERTIES ("hbase.table.name" = "cc");
    OK
    Time taken: 9.302 seconds
    hive> ▊




    由于有那个映射关系,所以往hbase里面插入数据可以在hive里面查询,不过hive不支持类似于mysql的insert语句,所以方便的也只能往hbase里面插入数据


    向hbase里面插入数据
    hbase(main):013:0> put 'cc', '1', 'cf:val', 'hello cc!'
    0 row(s) in 0.0120 seconds


    hbase(main):014:0> ▊




    在hive里面查询
    hive> select * from cctable;
    OK
    1       hello cc!
    Time taken: 30.838 seconds, Fetched: 1 row(s)
    hive> ▊




    虽然hbase不支持count方法去计算行数,但是hive可以,不过这个会被转换成mapreduce过程,去执行
    hive> select count(*) from cctable;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_1408532552242_0005, Tracking URL = http://master1:8088/proxy/application_1408532552242_0005/
    Kill Command = /home/cc/hadoop-2.3.0-cdh5.0.0/bin/hadoop job  -kill job_1408532552242_0005
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2014-08-22 10:15:00,861 Stage-1 map = 0%,  reduce = 0%
    2014-08-22 10:15:14,479 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:15,525 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:16,572 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:17,617 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:18,662 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:19,707 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:20,753 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:21,798 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:22,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:23,889 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:24,937 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:25,991 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:27,040 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.73 sec
    2014-08-22 10:15:28,094 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.89 sec
    2014-08-22 10:15:29,143 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.89 sec
    MapReduce Total cumulative CPU time: 4 seconds 890 msec
    Ended Job = job_1408532552242_0005
    MapReduce Jobs Launched: 
    Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.89 sec   HDFS Read: 236 HDFS Write: 2 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 890 msec
    OK
    1
    Time taken: 77.571 seconds, Fetched: 1 row(s)
    hive> ▊

  • 相关阅读:
    joomla allvideo 去掉embed share
    程序员高效开发的几个技巧
    分布式icinga2安装与使用
    Openstack Murano(kilo)二次开发之添加Volume
    autohotkey在运维中的应用
    快应用之我见
    目前微服务/REST的最佳技术栈
    2016 年终总结
    2015年终总结
    用TypeScript开发了一个网页游戏引擎,开放源代码
  • 原文地址:https://www.cnblogs.com/jamesf/p/4751467.html
Copyright © 2020-2023  润新知