• hive和Hbase整合


    本文部分来源:http://www.it165.net/admin/html/201406/3239.html

      https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

    创建 表:hbase_hive_1

    REATE TABLE hbase_hive_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
     WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");


    创建有分区的表:hbase_hive_2

    CREATE TABLE hbase_hive_2(key int, value string) partitioned by (day string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
     WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    TBLPROPERTIES ("hbase.table.name" = "xyz2");
    创建表pokes:
    create table pokes(foo int,bar string) row format delimited fields terminated by ',';
    hivedata:
    101, zhanggsan
    1001, lisi    
    102, wangwu
    load data local inpath '/out/hivedata' overwrite into table pokes;



    使用sql导入hbase_table_1

    SET hive.hbase.bulk=true;
    insert overwrite table hbase_hive_1 select * from pokes;

    导入有分区的表

    insert overwrite table hbase_hive_2 partition (day='2014-07-29') select * from pokes;

      执行:

    insert overwrite table hbase_hive_1 select * from pokes;

    出现错误异常如下:

    Task with the most failures(4): 
    -----
    Task ID:
      task_1406541025007_0008_m_000000
    
    URL:
      http://master:8088/taskdetails.jsp?jobid=job_1406541025007_0008&tipid=task_1406541025007_0008_m_000000
    -----
    Diagnostic Messages for this Task:
    Container [pid=9545,containerID=container_1406541025007_0008_01_000005] is running beyond virtual memory limits. Current usage: 263.0 MB of 1 GB physical 
    memory used; 3.6 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1406541025007_0008_01_000005 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 9545 2012 9545 9545 (java) 598 19 3909980160 67320 /home/hadoop/jdk1.7.0_51/bin/java -Djava.net.preferIPv4Stack=true
    -Dhadoop.metrics.log.level=WARN
    -Xmx3072m -Djava.io.tmpdir=/data/tmp/nm-local-dir/usercache/hadoop/appcache/application_1406541025007_0008/
    container_1406541025007_0008_01_000005/tmp -Dlog4j
    .configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/home/hadoop/hadoop-2.0.0-cdh4.5.0/
    logs/userlogs/application_1406541025007_0008
    /container_1406541025007_0008_01_000005 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild
    192.168.1.153 52225 attempt_1406541025007_0008_m_000000_3 5

     异常解决办法:

    复制hbase/conf下的hbase-site.xml文件到所有hadoop节点的hadoop/conf下

    hive>  INSERT OVERWRITE TABLE hbase_hive_1 SELECT * FROM pokes;               
    
    2014-07-29 13:40:46,601 Stage-0 map = 0%,  reduce = 0%
    2014-07-29 13:40:52,862 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.92 sec
    Job 0: Map: 1   Cumulative CPU: 1.92 sec   HDFS Read: 244 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 920 msec
    OK
    Time taken: 12.742 seconds
    hive> select * from hbase_hive_1;                               
    OK
    1001     lisi    
    101     zhanggsan
    102     wangwu
    Time taken: 0.135 seconds 

    查询hbase:
    hbase(main):023:0> scan 'xyz'
    ROW                                              COLUMN+CELL                                                                                                                                   
     1001                                            column=cf1:val, timestamp=1406612452333, value= lisix09                                                                                      
     101                                             column=cf1:val, timestamp=1406612452333, value= zhanggsan                                                                                     
     102                                             column=cf1:val, timestamp=1406612452333, value= wangwu       
  • 相关阅读:
    YbSoftwareFactory 代码生成插件【二十】:DynamicObject的序列化
    YbSoftwareFactory 代码生成插件【十九】:实体类配合数据库表字段进行属性扩展的小技巧
    YbSoftwareFactory 代码生成插件【十八】:树形结构下的查询排序的数据库设计
    YbSoftwareFactory 代码生成插件【十七】:先进的权限模型体系设计
    YbSoftwareFactory 代码生成插件【十六】:Web 下灵活、强大的审批流程实现(含流程控制组件、流程设计器和表单设计器)
    YbSoftwareFactory 代码生成插件【十五】:Show 一下最新的动态属性扩展功能与键值生成器功能
    YbSoftwareFactory 代码生成插件【十四】:通过 DynamicLinq 简单实现 N-Tier 部署下的服务端数据库通用分页
    YbSoftwareFactory 代码生成插件【十三】:Web API 的安全性
    Navicat 密码加密算法
    GitHub开源的超棒后台管理面板
  • 原文地址:https://www.cnblogs.com/zhanggl/p/3875379.html
Copyright © 2020-2023  润新知