• Hbase设置多个hmaster


    为了保证HBase集群的高可靠性,HBase支持多Backup Master 设置。当Active Master挂掉后,Backup Master可以自动接管整个HBase的集群。

    该配置极其简单:

    在$HBASE_HOME/conf/ 目录下新增文件配置backup-masters,在其内添加要用做Backup Master的节点hostname。如下:

    [hbase@master conf]$ cat backup-masters 
    node1

    之后,启动整个集群,我们会发现,在master和node1上,都启动了HMaster进程:

    [hbase@master conf]$ jps
    25188 NameNode
    3319 QuorumPeerMain
    31725 Jps
    25595 ResourceManager
    31077 HMaster
    25711 NodeManager
    25303 DataNode
    31617 Main
    31220 HRegionServer
    [hbase@node1 root]$ jps
    11560 DataNode
    11762 NodeManager
    20769 Jps
    415 QuorumPeerMain
    11675 SecondaryNameNode
    20394 HRegionServer
    20507 HMaster

    此时查看node1上master节点的log,可以看到如下的信息:

    [hbase@node1 logs]$ tail -f hbase-hbase-master-node1.log
    2015-10-10 05:35:09,609 INFO  [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:60010
    2015-10-10 05:35:09,613 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true
    2015-10-10 05:35:09,631 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/node1,60000,1444455307700
    2015-10-10 05:35:09,806 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, master,60000,1444455305852; waiting to become the next active master
    2015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x10135dbc connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
    2015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x10135dbc0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
    2015-10-10 05:35:09,859 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)
    2015-10-10 05:35:09,860 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session
    2015-10-10 05:35:09,885 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c10017, negotiated timeout = 40000
    2015-10-10 05:35:09,920 INFO  [master/node1/10.0.52.145:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e

    该信息说明,当前hbase集群有活动的master节点,该master节点为master,所以node1节点开始等待,直到master节点上的hmaster挂掉。slave1会变成新的Active 的 Master节点。

    此时,直接kill掉master节点上HMaster进程,查看node1上master节点log会发现:

    2015-10-10 05:42:17,173 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/node1,60000,1444455307700 from backup master directory
    2015-10-10 05:42:17,194 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=node1,60000,1444455307700
    2015-10-10 05:42:17,758 INFO  [node1:60000.activeMasterManager] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks
    2015-10-10 05:42:17,776 INFO  [node1:60000.activeMasterManager] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes
    2015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x29d405f7 connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
    2015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x29d405f70x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
    2015-10-10 05:42:17,883 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)
    2015-10-10 05:42:17,884 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session
    2015-10-10 05:42:17,904 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c1001b, negotiated timeout = 40000
    2015-10-10 05:42:17,942 INFO  [node1:60000.activeMasterManager] balancer.StochasticLoadBalancer: loading config
    2015-10-10 05:42:18,061 INFO  [node1:60000.activeMasterManager] master.HMaster: Server active/primary master=node1,60000,1444455307700, sessionid=0x150463058ac001a, setting cluster-up flag (Was=true)
    2015-10-10 05:42:18,154 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort
    2015-10-10 05:42:18,184 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort
    2015-10-10 05:42:18,256 INFO  [node1:60000.activeMasterManager] master.MasterCoprocessorHost: System coprocessor loading is enabled
    2015-10-10 05:42:18,286 INFO  [node1:60000.activeMasterManager] procedure2.ProcedureExecutor: Starting procedure executor threads=5
    2015-10-10 05:42:18,288 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Starting WAL Procedure Store lease recovery
    2015-10-10 05:42:18,296 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: Recovering lease on dfs file hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log
    2015-10-10 05:42:18,307 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log after 9ms
    2015-10-10 05:42:18,324 WARN  [node1:60000.activeMasterManager] wal.WALProcedureStore: Unable to read tracker for hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log - Missing trailer: size=9 startPos=9
    2015-10-10 05:42:18,373 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Lease acquired for flushLogId: 28
    2015-10-10 05:42:18,383 WARN  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: nothing left to decode. exiting with missing EOF
    2015-10-10 05:42:18,383 INFO  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: No active entry found in state log hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log. removing it
    2015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
    2015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
    2015-10-10 05:42:18,407 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Opening socket connection to server node1/10.0.52.145:2181. Will not attempt to authenticate using SASL (unknown error)
    2015-10-10 05:42:18,408 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Socket connection established to node1/10.0.52.145:2181, initiating session
    2015-10-10 05:42:18,426 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Session establishment complete on server node1/10.0.52.145:2181, sessionid = 0x250463058780018, negotiated timeout = 40000
    2015-10-10 05:42:18,464 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2015-10-10 05:42:19,970 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2015-10-10 05:42:21,475 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3011 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2015-10-10 05:42:22,980 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4516 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2015-10-10 05:42:23,058 INFO  [PriorityRpcServer.handler=3,queue=1,port=60000] master.ServerManager: Registering server=node1,16020,1444455306545
    2015-10-10 05:42:23,059 INFO  [PriorityRpcServer.handler=5,queue=1,port=60000] master.ServerManager: Registering server=master,16020,1444455306763
    2015-10-10 05:42:23,060 INFO  [PriorityRpcServer.handler=1,queue=1,port=60000] master.ServerManager: Registering server=node2,16020,1444455305886
    2015-10-10 05:42:23,081 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 4617 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2015-10-10 05:42:24,586 INFO  [node1:60000.activeMasterManager] master.ServerManager: Finished waiting for region servers count to settle; checked in 3, slept for 6122 ms, expecting minimum of 1, maximum of 2147483647, master is running
    2015-10-10 05:42:24,610 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/master,16020,1444455306763 belongs to an existing region server
    2015-10-10 05:42:24,619 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node1,16020,1444455306545 belongs to an existing region server
    2015-10-10 05:42:24,625 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node2,16020,1444455305886 belongs to an existing region server
    2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1444455744651, server=null} to {1588230740 state=OPEN, ts=1444455744756, server=node2,16020,1444455305886}
    2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting
    2015-10-10 05:42:24,760 INFO  [node1:60000.activeMasterManager] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=node2,16020,1444455305886
    2015-10-10 05:42:24,895 INFO  [node1:60000.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
    2015-10-10 05:42:24,985 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Found regions out on cluster or in RIT; presuming failover
    2015-10-10 05:42:25,000 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Joined the cluster in 104ms, failover=true
    2015-10-10 05:42:25,216 INFO  [node1:60000.activeMasterManager] master.HMaster: Master has completed initialization
    2015-10-10 05:42:25,234 INFO  [node1:60000.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled

    可见,node1节点上Backup Master 已经结果HMaster,成为Active HMaster

    重新启动master节点上的hmaster

    [hbase@master bin]$ ./hbase-daemon.sh start master 
    starting master, logging to /usr/local/hbase//logs/hbase-hbase-master-master.out
    [hbase@master bin]$ jps
    25188 NameNode
    32351 Jps
    3319 QuorumPeerMain
    32265 HMaster
    25595 ResourceManager
    25711 NodeManager
    25303 DataNode
    31220 HRegionServer

    查看master节点的log发现,它变为了backup master

    [hbase@master logs]$ tail -f  hbase-hbase-master-master.log
    2015-10-10 05:53:15,329 INFO  [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:60010
    2015-10-10 05:53:15,333 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true
    2015-10-10 05:53:15,348 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/master,60000,1444456393819
    2015-10-10 05:53:15,488 INFO  [master:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, node1,60000,1444455307700; waiting to become the next active master
    2015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x323b7deb connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
    2015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x323b7deb0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
    2015-10-10 05:53:15,524 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/10.0.52.144:2181. Will not attempt to authenticate using SASL (unknown error)
    2015-10-10 05:53:15,525 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Socket connection established to master/10.0.52.144:2181, initiating session
    2015-10-10 05:53:15,536 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Session establishment complete on server master/10.0.52.144:2181, sessionid = 0x150463058ac001c, negotiated timeout = 40000
    2015-10-10 05:53:15,567 INFO  [master/master/10.0.52.144:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e
  • 相关阅读:
    了解Django之前
    jQuery
    java模板模式项目中使用--封装一个http请求工具类
    spring boot项目配置RestTemplate超时时长
    TortoiseSVN-1.7.12.24070-x64-svn-1.7.9安装包和汉化包
    ubuntu16.04环境下在docker上部署javaweb项目简单案例
    工厂模式
    面向对象第四次博客
    面向对象第三次作业总结
    oo第二次博客
  • 原文地址:https://www.cnblogs.com/prayer21/p/4866673.html
Copyright © 2020-2023  润新知