• Hbase数据备份&&容灾方案


    标签(空格分隔): Hbase


     在使用distcp命令copy hdfs文件的方式实现备份时,需要禁用备份表确保copy时该表没有数据写入,对于在线服务的hbase集群,该方式不可用,将静态此目录distcp 到其他HDFS文件系统时候,可以通过在其他集群直接启动新Hbase 集群将所有数据恢复。



     hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr= dstClusterZK:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy  srcTable
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=dstClusterZK:2181:/hbase srcTable

    跨集群copytable 必须注意是用推的方式,即从原集群运行此命令。

    copytable eg

    $ ./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
    /bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
    Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
     rs.class     hbase.regionserver.class of the peer cluster,
                  specify if different from current cluster
     rs.impl      hbase.regionserver.impl of the peer cluster,
     startrow     the start row
     stoprow      the stop row
     starttime    beginning of the time range (unixtime in millis)
                  without endtime means from starttime to forever
     endtime      end of the time range.  Ignored if no starttime specified.
     versions     number of cell versions to copy
     new.name     new table's name
     peer.adr     Address of the peer cluster given in the format
     families     comma-separated list of families to copy
                  To copy from cf1 to cf2, give sourceCfName:destCfName.
                  To keep the same name, just give "cfName"
     all.cells    also copy delete markers and deleted cells
     tablename    Name of the table to copy
     To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
     $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable
    For performance consider the following general options:
      It is recommended that you set the following to >=100. A higher value uses more memory but
      decreases the round trip time to the server and may increase performance.
      The following should always be set to false, to prevent writing data twice, which may produce
      inaccurate results.


    hbase org.apache.hadoop.hbase.mapreduce.CopyTable –starttime=1478448000000 –endtime=1478591994506 –peer.adr=VECS00001,VECS00002,VECS00003:2181:/hbase –families=txjl –new.name=hy_membercontacts_bk  hy_membercontacts
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable –starttime=1478448000000 –endtime=1478591994506 –new.name=hy_membercontacts_bk  hy_membercontacts
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable –starttime=1477929600000 –endtime=1478591994506 –new.name=hy_linkman_tmp hy_linkman
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable –new.name=hy_mobileblacklist_bk_before_del hy_mobileblacklist
    scan ‘hy_linkman’, {COLUMNS => ‘lxr:sguid’, TIMERANGE => [1478966400000, 1479052799000]}
    scan ‘hy_mobileblacklist’, {COLUMNS => ‘mobhmd:sguid’, TIMERANGE => [1468719824000, 1468809824000]}
    hbase org.apache.hadoop.hbase.mapreduce.CopyTable –new.name=hy_mobileblacklist_bk_before_del_20161228 hy_mobileblacklist


    Export 执行导出命令


    hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.scan.column.family=cf -D hbase.mapreduce.scan.row.start=0000001 -D hbase.mapreduce.scan.row.stop=1000000 table_name /tmp/hbase_export


    Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
      Note: -D properties will be applied to the conf used. 
      For example: 
       -D mapred.output.compress=true
       -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
       -D mapred.output.compression.type=BLOCK
      Additionally, the following SCAN properties can be specified
      to control/limit what is exported..
       -D hbase.mapreduce.scan.column.family=<familyName>
       -D hbase.mapreduce.include.deleted.rows=true
    For performance consider the following properties:
    For tables with very wide rows consider setting the batch size as below:

    Import 执行导入命令

    create 'table_name','cf'


    hbase org.apache.hadoop.hbase.mapreduce.Import table_name hdfs://flashhadoop/tmp/hbase_export/


    Usage: Import [options] <tablename> <inputdir>
    By default Import will load data directly into HBase. To instead generate
    HFiles of data to prepare for a bulk data load, pass the option:
     To apply a generic org.apache.hadoop.hbase.filter.Filter to the input, use
      -Dimport.filter.class=<name of filter class>
      -Dimport.filter.args=<comma separated list of args for filter
     NOTE: The filter will be applied BEFORE doing key renames via the HBASE_IMPORTER_RENAME_CFS property. Futher, filters will only use the Filter#filterRowKey(byte[] buffer, int offset, int length) method to identify  whether the current row needs to be ignored completely for processing and  Filter#filterKeyValue(KeyValue) method to determine if the KeyValue should be added; Filter.ReturnCode#INCLUDE and #INCLUDE_AND_NEXT_COL will be considered as including the KeyValue.
    For performance consider the following options:
      -Dimport.wal.durability=<Used while writing data to hbase. Allowed values are the supported durability values like SKIP_WAL/ASYNC_WAL/SYNC_WAL/...>


    即为Hbase 表的镜像。

    需要提前开启Hbase 集群的snapshot 功能。


    在hbase shell中使用clone_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot命令可是是想创建快照,查看快照,通过快照恢复表,通过快照创建一个新的表等功能,


    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table_name_snapshot -copy-to hdfs://flashhadoop_2/hbase -mappers 2

    执行该命令后,在flashhadoop_2的hdfs中会把table_name_snapshot文件夹copy到/hbase/.hbase-snapshot文件下,进入flashhadoop_2这个hbase集群,执行list_snapshots会看到有一个快照:table_name_snapshot,通过命令clone_snapshot可以把该快照copy成一个新的表,不用提前创建表,新表的region个数等信息完全与快照保持一致。也可以先创建一张与原表相同的表,然后通过restore snapshot的方式恢复表,但会多出一个region.这个region 将会失效。



    1、 如果主从hbase集群共用一个zk集群,则zookeeper.znode.parent不能都是默认的hbase,可以配置为hbase-master和hbase-slave,总之在zk 中的znode节点命名不能冲突。
    2,在主,从hbase集群的hbase-site.xml中添加配置项:(其实做主从模式的话,只需要将从集群hbase.replication设置为true 即可,其他可以忽略。)

    <description>主集群在RS宕机多长时间后进行failover,默认为2秒,具体的sleep时间是: sleepBeforeFailover + (long) (new Random().nextFloat() * sleepBeforeFailover) </description>
    4,分别在主从集群hbase shell中
    add_peer 'ID' 'CLUSTER_KEY'
    The ID must be a short integer. To compose the CLUSTER_KEY, use the following template:
    This will show you the help to setup the replication stream between both clusters. If both clusters use the same Zookeeper cluster, you have to use a different zookeeper.znode.parent since they can't write in the same folder.


    增加主Hbase 到容灾 Hbase 数据表 同步 
    add_peer '1',  "VECS00840,VECS00841,VECS00842,VECS00843,VECS00844:2181:/hbase"


    增加容灾Hbase 到主 Hbase 数据表 同步 
    add_peer '2',  "VECS00994,VECS00995,VECS00996,VECS00997,VECS00998:2181:/hbase"


    hbase shell>
    create 't_warehouse_track', {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

    4,在主集群hbase shell

    enable_table_replication 't_warehouse_track'

    5,在容灾集群hbase shell

    disable 'your_table'
    alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
    enable 'your_table
    此处的REPLICATION_SCOPE => '1'中的1,与第3步中设置到“ID”无关系,这个值只有0或者1,标示开启复制或者关闭。
