• HBase数据快速导入之ImportTsv&Bulkload


    1.SHELL方式
    #创建表:
    create 'testImport1','cf'


    #准备文件
    [hadoop@master test]$ more sample1.csv
    1,"tom"
    2,"sam"
    3,"jerry"
    4,"marry"
    5,"john


    #上传到hadoop文件系统上
    hadoop fs -put sample1.csv  /home/hadoop/test/



    #导入到表里(时间有点久)
    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf testImport1 /home/hadoop/test/sample1.csv


    #查看导入到表里
    hbase(main):066:0> scan 'testImport1'
    ROW                                        COLUMN+CELL                                                                                                                 
     1                                         column=cf:, timestamp=1548370307097, value="tom"                                                                            
     2                                         column=cf:, timestamp=1548370307097, value="sam"                                                                            
     3                                         column=cf:, timestamp=1548370307097, value="jerry"                                                                          
     4                                         column=cf:, timestamp=1548370307097, value="marry"                                                                          
     5                                         column=cf:, timestamp=1548370307097, value="john 

    2.先通过ImportTsv生产HFile文件,再通过completeBulkload导入HBase


    create 'testImport2','cf'

    #使用命令生产HFile文件
    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=hfile_tmp -Dimporttsv.columns=HBASE_ROW_KEY,cf testImport2 /home/hadoop/test/sample1.csv

    #生成的中间结果集
    [hadoop@master test]$ hadoop fs -ls /user/hadoop/hfile_tmp/cf                                  
    Found 1 items
    -rw-r--r--   3 hadoop supergroup       5179 2019-01-24 15:09 /user/hadoop/hfile_tmp/cf/7ce1f217dd9e401bab3773938bd410c4


    使用命令将HFile文件导入HBase
    hadoop jar /home/hadoop/hbase/lib/hbase-server-1.4.9.jar completebulkload hfile_tmp testImport2

    在执行这个命令之前,需要把HBASE下lib下的jar包拷贝到hadoop下:
    cp * /home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/

    验证查看:
    hbase(main):081:0*  scan 'testImport2'
    ROW                                        COLUMN+CELL                                                                                                                 
     1                                         column=cf:, timestamp=1548370987951, value="tom"                                                                            
     2                                         column=cf:, timestamp=1548370987951, value="sam"                                                                            
     3                                         column=cf:, timestamp=1548370987951, value="jerry"                                                                          
     4                                         column=cf:, timestamp=1548370987951, value="marry"                                                                          
     5                                         column=cf:, timestamp=1548370987951, value="john                                                                            
    5 row(s) in 0.0740 seconds

    完。

  • 相关阅读:
    encodeURI() 的用法
    $().each() 与 $.each()区别,以及 jquery ajax 应用
    每日一乐,健康多滋味~~
    IIS部署ASP.NET MVC (4.0)网站出现的错误
    《程序员级别鉴定书》 ----中级.NET开发者
    《转》程序员必须知道的10大基础实用算法及其讲解
    C# 托管资源和非托管资源
    11、E-commerce in Your Inbox:Product Recommendations at Scale-----产品推荐(prod2vec和user2vec)
    二叉树(2)----路径
    10、Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking-----基于记忆注意的潜在关系度量协同排序
  • 原文地址:https://www.cnblogs.com/hello-wei/p/10315223.html
Copyright © 2020-2023  润新知