• 2.11-2.12 HBase的数据迁移常见方式


    一、importtsv

    把hdfs中数据抽取到HBase表中;

    1、准备数据

    ##student.tsv
    [root@hadoop-senior datas]# cat student.tsv 
    10001    zhangsan    35    male    beijing    0109876543
    10002    lisi    32    male    shanghia    0109876563
    10003    zhaoliu    35    female    hangzhou    01098346543
    10004    qianqi    35    male    shenzhen    01098732543
    
    
    ##
    [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -mkdir -p /user/root/hbase/importtsv
    [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -put /opt/datas/student.tsv /user/root/hbase/importtsv
    
    
    ##创建HBase表
    hbase(main):005:0> create 'student', 'info'
    0 row(s) in 0.1530 seconds
    
    => Hbase::Table - student


    2、执行

    ##执行,下列命令全部执行
    export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
    export HADOOP_HOME=/opt/modules/hadoop-2.5.0
    HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf 
            ${HADOOP_HOME}/bin/yarn jar 
    ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv 
    -Dimporttsv.columns=HBASE_ROW_KEY,
    info:name,info:age,info:sex,info:address,info:phone 
    student 
    hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv
    
    
    
    ##查看结果
    hbase(main):006:0> scan 'student'
    ROW                                          COLUMN+CELL                                                                                                                    
     10001                                       column=info:address, timestamp=1558594471571, value=beijing                                                                    
     10001                                       column=info:age, timestamp=1558594471571, value=35                                                                             
     10001                                       column=info:name, timestamp=1558594471571, value=zhangsan                                                                      
     10001                                       column=info:phone, timestamp=1558594471571, value=0109876543                                                                   
     10001                                       column=info:sex, timestamp=1558594471571, value=male                                                                           
     10002                                       column=info:address, timestamp=1558594471571, value=shanghia                                                                   
     10002                                       column=info:age, timestamp=1558594471571, value=32                                                                             
     10002                                       column=info:name, timestamp=1558594471571, value=lisi                                                                          
     10002                                       column=info:phone, timestamp=1558594471571, value=0109876563                                                                   
     10002                                       column=info:sex, timestamp=1558594471571, value=male                                                                           
     10003                                       column=info:address, timestamp=1558594471571, value=hangzhou                                                                   
     10003                                       column=info:age, timestamp=1558594471571, value=35                                                                             
     10003                                       column=info:name, timestamp=1558594471571, value=zhaoliu                                                                       
     10003                                       column=info:phone, timestamp=1558594471571, value=01098346543                                                                  
     10003                                       column=info:sex, timestamp=1558594471571, value=female                                                                         
     10004                                       column=info:address, timestamp=1558594471571, value=shenzhen                                                                   
     10004                                       column=info:age, timestamp=1558594471571, value=35                                                                             
     10004                                       column=info:name, timestamp=1558594471571, value=qianqi                                                                        
     10004                                       column=info:phone, timestamp=1558594471571, value=01098732543                                                                  
     10004                                       column=info:sex, timestamp=1558594471571, value=male


    二、bulk load

    1、bulk load

    HBase支持bulk load的入库方式,它是利用hbase的数据信息按照特定格式存储在hdfs内这一原理,直接在HDFS中生成持久化的HFile数据格式文件,
    然后上传至合适位置,即完成巨量数据快速入库的办法。配合mapreduce完成,高效便捷,而且不占用region资源,增添负载,在大数据量写入时能
    极大的提高写入效率,并降低对HBase节点的写入压力。
    
    通过使用先生成HFile,然后再BulkLoad到Hbase的方式来替代之前直接调用HTableOutputFormat的方法有如下的好处:
    (1)消除了对HBase集群的插入压力
    (2)提高了Job的运行速度,降低了Job的执行时间


    2、生成HFile

    ##建表
    hbase(main):007:0> create 'student2', 'info'
    0 row(s) in 0.1320 seconds
    => Hbase::Table - student2
    
    
    
    ##生成Hfile
    export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
    export HADOOP_HOME=/opt/modules/hadoop-2.5.0
    HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf 
            ${HADOOP_HOME}/bin/yarn jar 
    ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv 
    -Dimporttsv.columns=HBASE_ROW_KEY,
    info:name,info:age,info:sex,info:address,info:phone 
    -Dimporttsv.bulk.output=hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput 
    student2 
    hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv
    
    
    
    ##查看
    [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -ls /user/root/hbase/hfileoutput/info
    Found 1 items
    -rw-r--r--   1 root supergroup       1888 2019-05-24 13:31 /user/root/hbase/hfileoutput/info/8c28c6c654bc4fe2aa2c32ef54480771


    2、将数据导入进表student2

    ##导数据
    export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
    export HADOOP_HOME=/opt/modules/hadoop-2.5.0
    HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf 
            ${HADOOP_HOME}/bin/yarn jar 
    ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar 
    completebulkload 
    hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput 
    student2
    
    
    ##scan student2
    hbase(main):008:0> scan 'student2'
    ROW                                          COLUMN+CELL                                                                                                                    
     10001                                       column=info:address, timestamp=1558675878109, value=beijing                                                                    
     10001                                       column=info:age, timestamp=1558675878109, value=35                                                                             
     10001                                       column=info:name, timestamp=1558675878109, value=zhangsan                                                                      
     10001                                       column=info:phone, timestamp=1558675878109, value=0109876543                                                                   
     10001                                       column=info:sex, timestamp=1558675878109, value=male                                                                           
     10002                                       column=info:address, timestamp=1558675878109, value=shanghia                                                                   
     10002                                       column=info:age, timestamp=1558675878109, value=32                                                                             
     10002                                       column=info:name, timestamp=1558675878109, value=lisi                                                                          
     10002                                       column=info:phone, timestamp=1558675878109, value=0109876563                                                                   
     10002                                       column=info:sex, timestamp=1558675878109, value=male                                                                           
     10003                                       column=info:address, timestamp=1558675878109, value=hangzhou                                                                   
     10003                                       column=info:age, timestamp=1558675878109, value=35                                                                             
     10003                                       column=info:name, timestamp=1558675878109, value=zhaoliu                                                                       
     10003                                       column=info:phone, timestamp=1558675878109, value=01098346543                                                                  
     10003                                       column=info:sex, timestamp=1558675878109, value=female                                                                         
     10004                                       column=info:address, timestamp=1558675878109, value=shenzhen                                                                   
     10004                                       column=info:age, timestamp=1558675878109, value=35                                                                             
     10004                                       column=info:name, timestamp=1558675878109, value=qianqi                                                                        
     10004                                       column=info:phone, timestamp=1558675878109, value=01098732543                                                                  
     10004                                       column=info:sex, timestamp=1558675878109, value=male                                                                           
    4 row(s) in 0.0420 seconds


    3、在MapReduce中生成HFile文件

  • 相关阅读:
    linux环境安装es插件elasticsearch-head
    Linux环境安装安装NodeJS v10.16.3
    几种常见的关系型和非关系型数据库
    在window下安装Redis数据库,并用python链接Redis
    数据库锁机制
    脏读,不可重复读,幻读,丢失更新
    bit, byte, KB, GB, TB, PB, EB, ZB, YB, BB, NB, DB, CB, XB
    shell 替换字符串的几种方法,变量替换${},sed,awk
    国内外短信接码平台合集
    canvas获取浏览器指纹-唯一的设备标识
  • 原文地址:https://www.cnblogs.com/weiyiming007/p/10917969.html
Copyright © 2020-2023  润新知