• HBase备份恢复练习


    一.冷备

    1.创建测试表并插入测试数据

    [root@weekend05 ~]# hbase shell

    hbase(main):005:0> create 'scores','grade','course'

    0 row(s) in 0.4940 seconds

    => Hbase::Table – scores

    put 'scores','Tom','grade:','5'

    hbase(main):006:0> put 'scores','Tom','course:math','97'

    0 row(s) in 0.0710 seconds

    hbase(main):007:0> put 'scores','Tom','course:art','87'

    0 row(s) in 0.0100 seconds

    hbase(main):008:0> put 'scores','Tom','course:english','80'

    0 row(s) in 0.0100 seconds

    hbase(main):009:0> put 'scores','Jim','grade:','4'

    0 row(s) in 0.0210 seconds

    hbase(main):010:0> put 'scores','Jim','course:chinese','89'

    0 row(s) in 0.0410 seconds

    hbase(main):011:0> put 'scores','Jim','course:english','80'

    0 row(s) in 0.0090 seconds

    hbase(main):012:0> scan 'scores'

    ROW COLUMN+CELL

    Jim column=course:chinese, timestamp=1465031482172, value=89

    Jim column=course:english, timestamp=1465031493584, value=80

    Jim column=grade:, timestamp=1465031470174, value=4

    Tom column=course:art, timestamp=1465031443686, value=87

    Tom column=course:english, timestamp=1465031459071, value=80

    Tom column=course:math, timestamp=1465031419695, value=97

    2 row(s) in 0.0920 seconds

    2.停止HBase

    [root@master ~]# stop-hbase.sh

    stopping hbase................

    3.进行备份

    [root@weekend05 ~]# hadoop distcp /hbase /hbasebackup

    。。。。。

    内容太多

    File System Counters

    FILE: Number of bytes read=315885

    FILE: Number of bytes written=456706

    FILE: Number of read operations=0

    FILE: Number of large read operations=0

    FILE: Number of write operations=0

    HDFS: Number of bytes read=103581

    HDFS: Number of bytes written=103581

    HDFS: Number of read operations=943

    HDFS: Number of large read operations=0

    HDFS: Number of write operations=187

    Map-Reduce Framework

    Map input records=161

    Map output records=22

    Input split bytes=151

    Spilled Records=0

    Failed Shuffles=0

    Merged Map outputs=0

    GC time elapsed (ms)=7

    CPU time spent (ms)=0

    Physical memory (bytes) snapshot=0

    Virtual memory (bytes) snapshot=0

    Total committed heap usage (bytes)=130547712

    File Input Format Counters

    Bytes Read=37917

    File Output Format Counters

    Bytes Written=2424

    org.apache.hadoop.tools.mapred.CopyMapper$Counter

    BYTESCOPIED=103581

    BYTESEXPECTED=103581

    BYTESSKIPPED=10257

    COPY=139

    SKIP=22

    4.启动HBase

    [root@weekend05 ~]# start-hbase.sh

    [root@weekend05 ~]#hbase shell

    5.删除测试数据

    hbase(main):001:0> list

    TABLE

    hbase_student

    my_data

    myns1

    new_scores

    nyist

    scores

    user

    7 row(s) in 0.2270 seconds

    => ["hbase_student", "my_data", "myns1", "new_scores", "nyist", "scores", "user"]

    hbase(main):002:0> disable 'scores'

    0 row(s) in 1.3210 seconds

    hbase(main):003:0> drop 'scores'

    0 row(s) in 0.4170 seconds

    6.停止HBase

    [root@weekend05 ~]# stop-hbase.sh

    stopping hbase...............

    7.恢复数据

    [root@weekend05 ~]# hdfs dfs -mv /hbase /hbase_tmp’

    [root@weekend05 ~]# hadoop distcp -overwrite /hbasebackup /

    ………上面省略

    File System Counters

    FILE: Number of bytes read=406438

    FILE: Number of bytes written=500029

    FILE: Number of read operations=0

    FILE: Number of large read operations=0

    FILE: Number of write operations=0

    HDFS: Number of bytes read=167648

    HDFS: Number of bytes written=167648

    HDFS: Number of read operations=1503

    HDFS: Number of large read operations=0

    HDFS: Number of write operations=313

    Map-Reduce Framework

    Map input records=211

    Map output records=0

    Input split bytes=152

    Spilled Records=0

    Failed Shuffles=0

    Merged Map outputs=0

    GC time elapsed (ms)=44

    CPU time spent (ms)=0

    Physical memory (bytes) snapshot=0

    Virtual memory (bytes) snapshot=0

    Total committed heap usage (bytes)=159383552

    File Input Format Counters

    Bytes Read=53305

    File Output Format Counters

    Bytes Written=8

    org.apache.hadoop.tools.mapred.CopyMapper$Counter

    BYTESCOPIED=167648

    BYTESEXPECTED=167648

    COPY=211

    8.启动HBase

    [root@weekend05 ~]# start-hbase.sh

    [root@weekend05 ~]# hbase shell

    9.查看测试数据

    hbase(main):001:0> scan 'scores'

    ROW COLUMN+CELL

    Jim column=course:chinese, timestamp=1465031482172, value=89

    Jim column=course:english, timestamp=1465031493584, value=80

    Jim column=grade:, timestamp=1465031470174, value=4

    Tom column=course:art, timestamp=1465031443686, value=87

    Tom column=course:english, timestamp=1465031459071, value=80

    Tom column=course:math, timestamp=1465031419695, value=97

    2 row(s) in 0.2320 seconds

    二.热备

    (1)使用export和import

    1.创建测试表并插入测试数据

    hbase(main):006:0> create 'scores','grade','course'

    0 row(s) in 0.4480 seconds

    => Hbase::Table - scores

    hbase(main):007:0> put 'scores','Tom','grade:','5'

    0 row(s) in 0.0910 seconds

    hbase(main):008:0> put 'scores','Tom','course:math','97'

    0 row(s) in 0.0240 seconds

    hbase(main):009:0> put 'scores','Tom','course:art','87'

    0 row(s) in 0.0130 seconds

    hbase(main):010:0> put 'scores','Tom','course:english','80'

    0 row(s) in 0.0100 seconds

    hbase(main):011:0> put 'scores','Jim','grade:','4'

    0 row(s) in 0.0100 seconds

    hbase(main):012:0> put 'scores','Jim','course:chinese','89'

    0 row(s) in 0.0130 seconds

    hbase(main):013:0> put 'scores','Jim','course:english','80'

    0 row(s) in 0.0210 seconds

    hbase(main):014:0> scan 'scores'

    ROW COLUMN+CELL

    Jim column=course:chinese, timestamp=1465032277623, value=89

    Jim column=course:english, timestamp=1465032284037, value=80

    Jim column=grade:, timestamp=1465032263803, value=4

    Tom column=course:art, timestamp=1465032250427, value=87

    Tom column=course:english, timestamp=1465032257150, value=80

    Tom column=course:math, timestamp=1465032242712, value=97

    Tom column=grade:, timestamp=1465032224863, value=5

    2 row(s) in 0.0320 seconds

    2.导出数据

    a.导出至Linux目录下

    [root@weekend05 ~]# hbase org.apache.hadoop.hbase.mapreduce.Export scores /tmp/scores

    .。。。。

    File System Counters

    FILE: Number of bytes read=3658123

    FILE: Number of bytes written=24070263

    FILE: Number of read operations=0

    FILE: Number of large read operations=0

    FILE: Number of write operations=0

    HDFS: Number of bytes read=19940317

    HDFS: Number of bytes written=385

    HDFS: Number of read operations=138

    HDFS: Number of large read operations=0

    HDFS: Number of write operations=3

    Map-Reduce Framework

    Map input records=2

    Map output records=2

    Input split bytes=64

    Spilled Records=0

    Failed Shuffles=0

    Merged Map outputs=0

    GC time elapsed (ms)=8

    CPU time spent (ms)=0

    Physical memory (bytes) snapshot=0

    Virtual memory (bytes) snapshot=0

    Total committed heap usage (bytes)=29687808

    File Input Format Counters

    Bytes Read=0

    File Output Format Counters

    Bytes Written=385

    b.导出至HDFS目录下

    [root@weekend05 ~]# hbase org.apache.hadoop.hbase.mapreduce.Export scores hdfs://master:9000/myback_scores

    .。。。。。

    File System Counters

    FILE: Number of bytes read=3658123

    FILE: Number of bytes written=24067137

    FILE: Number of read operations=0

    FILE: Number of large read operations=0

    FILE: Number of write operations=0

    HDFS: Number of bytes read=19940317

    HDFS: Number of bytes written=385

    HDFS: Number of read operations=138

    HDFS: Number of large read operations=0

    HDFS: Number of write operations=3

    Map-Reduce Framework

    Map input records=2

    Map output records=2

    Input split bytes=64

    Spilled Records=0

    Failed Shuffles=0

    Merged Map outputs=0

    GC time elapsed (ms)=8

    CPU time spent (ms)=0

    Physical memory (bytes) snapshot=0

    Virtual memory (bytes) snapshot=0

    Total committed heap usage (bytes)=29687808

    File Input Format Counters

    Bytes Read=0

    File Output Format Counters

    Bytes Written=385

    3.删除表

    hbase(main):015:0> disable 'scores'

    0 row(s) in 1.2200 seconds

    hbase(main):016:0> drop 'scores'

    0 row(s) in 0.1700 seconds4.恢复数据

    在HBase中创建表结构

    hbase(main):017:0> create 'scores','grade','course'

    0 row(s) in 0.4450 seconds

    => Hbase::Table – scores

    导入数据(两种方法)

    1)[root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.Import scores /tmp/scores

    2)[root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.Import scores hdfs://master:9000/myback_scores

    5.测试数据

    hbase(main):016:0> scan 'scores'

    ROW COLUMN+CELL

    Jim column=course:chinese, timestamp=1465032277623, value=89

    Jim column=course:english, timestamp=1465032284037, value=80

    Jim column=grade:, timestamp=1465032263803, value=4

    Tom column=course:art, timestamp=1465032250427, value=87

    Tom column=course:english, timestamp=1465032257150, value=80

    Tom column=course:math, timestamp=1465032242712, value=97

    Tom column=grade:, timestamp=1465032224863, value=5

    2 row(s) in 0.0320 seconds

    (2)使用CopyTable备份到本数据库内的表内

    在HBase中创建表结构

    hbase(main):020:0> create 'new_scores','grade','course'

    0 row(s) in 1.2730 seconds

    使用CopyTable进行备份

    [root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=new_scores scores

    查看数据

    hbase(main):025:0> scan 'new_scores'

    ROW COLUMN+CELL

    Jim column=course:chinese, timestamp=1464608884979, value=89

    Jim column=course:english, timestamp=1464608885088, value=80

    Jim column=grade:, timestamp=1464608884637, value=4

    Tom column=course:art, timestamp=1464608875482, value=87

    Tom column=course:english, timestamp=1464608875733, value=80

    Tom column=course:math, timestamp=1464608875256, value=97

    Tom column=grade:, timestamp=1464608875010, value=5

    2 row(s) in 0.1120 seconds

    疑难小结:

    HBase是一个基于LSM树(log-structured merge-tree)的分布式数据存储系统,它使用复杂的内部机制确保数据准确性、一致性、多版本等。因此,你如何获取数十个region server在HDFS和内存中的存储的众多HFile文件、WALs(Write-Ahead-Logs)的一致的数据备份?Snapshots(快照)

    HBase快照功能丰富,有很多特征,并且创建时不需要关闭集群。关于snapshot在文章《apache hbase snapshot介绍》中有更详细的介绍。

    快照能通过在HDFS中创建一个和unix硬链接相同的存储文件,简单捕捉你的hbase表的某一时刻的信息(图1)。这些快照在几秒内就可以完成,几乎对整个集群没有任何性能影响。并且,它只占用一个微不足道的空间。除了在metadata文件中存储的极少目录数据,你的数据不会冗余,快照允许你的系统回滚到(创建快照)那个时刻,当然,你需要恢复快照。通过在HBase shell中运行如下命令来创建一个表的快照:

    hbase(main):001:0> snapshot 'myTable', 'MySnapShot'

    在执行这条命令之后,你将发现在hdfs中有一些小的数据文件。在 /hbase/.snapshot/myTable (CDH4) 或者hbase/.hbase-snapshots (Apache 0.94.6.1),这些文件中存储着快照信息。想要恢复数据只需要执行在shell中执行如下命令:

    hbase(main):002:0> disable 'myTable'

    hbase(main):003:0> restore_snapshot 'MySnapShot'

    hbase(main):004:0> enable 'myTable'

    正如你看到的,恢复快照需要对表进行离线操作。一旦恢复快照,那任何在快照时刻之后做的增加/更新数据都会丢失。如果你的业务需求是这样的:你必须有数据的异地备份,你可以用exportSnapshot命令赋值一个表的数据到你的本地HDFS或者你选择的远程HDFS中。

    HBase复制(HBase Relication)

    HBase赋值是另外一个负载较轻的备份工具。复制有三种模式:主->从(master->slave),主<->主(master<->master)和循环(cyclic)。这种方法给你灵活的从任意数据中心获取数据并且确保它能获得在其他数据中心的所有副本。在一个数据中心发生灾难性故障的情况下,客户端应用程序可以利用DNS工具,重定向到另外一个备用位置复制是一个强大的,容错的过程。它提供了“最终一致性”,意味着在任何时刻,最近对一个表的编辑可能无法应用到该表的所有副本,但是最终能够确保一致。

    注:对于一个存在的表,你需要通过本文描述的其他方法,手工的拷贝源表到目的表。复制仅仅在你启动它之后才对新的写/编辑操作有效

    HBase的导出工具是一个内置的实用功能,它使数据很容易从hbase表导入HDFS目录下的SequenceFiles文件。它创造了一个 map reduce任务,通过一系列HBase API来调用集群,获取指定表格的每一行数据,并且将数据写入指定的HDFS目录中。这个工具对集群来讲是性能密集的,因为它使用了mapreduce和 HBase 客户端API。但是它的功能丰富,支持制定版本或日期范围,支持数据的筛选,从而使增量备份可用。

    下面是一个导出命令的简单例子:

    hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir>

    一旦你的表导出了,你就可以复制生成的数据文件到你想存储的任何地方(比如异地/离线集群存储)。你可以执行一个远程的HDFS集群/目录作为命令的输出目录参数,这样数据将会直接被导出到远程集群。使用这个方法需要网络,所以你应该确保到远程集群的网络连接是否可靠以及快速。

    拷贝表(CopyTable)

    拷贝表功能在文章《使用CopyTable在线备份HBase》中有详细描述,但是这里做了基本的总结。和导出功能类似,拷贝表也使用HBase API创建了一个mapreduce任务,以便从源表读取数据。不同的地方是拷贝表的输出是hbase中的另一个表,这个表可以在本地集群,也可以在远程集群。

    一个简单的例子如下:

    hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=testCopy test

    这个命令将会拷贝名为test的表到集群中的另外一个表testCopy。

    请注意,这里有一个明显的性能开销,它使用独立的“puts”操作来逐行的写入数据到目的表。如果你的表非常大,拷贝表将会导致目标region server上的memstore被填满,会引起flush操作并最终导致合并操作的产生,会有垃圾收集操作等等。

    此外,你必须考虑到在HBase上运行mapreduce任务所带来的性能影响。对于大型的数据集,这种方法的效果可能不太理想。

  • 相关阅读:
    java多线程(待完善)
    eclipse console 查看全部的输出
    maven仓库地址
    拷贝Maven工程依赖的jar包出来
    ElasticSearch
    python2学习------基础语法5(常用容器以及相关操作)
    文本框焦点事件改变默认文字
    随机更换图片
    妙味——JS数组的方法
    妙味——封装getStyle()获取样式
  • 原文地址:https://www.cnblogs.com/zd520pyx1314/p/7246602.html
Copyright © 2020-2023  润新知