模拟namenode崩溃,将name目录的内容全部删除,然后通过secondary namenode恢复namenode。
环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1
1、进入name目录下,删除name目录内容。
[huser@master name]$ pwd /home/huser/hadoop/tmp/dfs/name [huser@master name]$ ll drwxrwxr-x 2 huser huser 4096 4月 16 20:16 current drwxrwxr-x 2 huser huser 4096 4月 16 17:24 image -rw-rw-r-- 1 huser huser 0 4月 16 20:10 in_use.lock drwxrwxr-x 2 huser huser 4096 4月 16 18:55 previous.checkpoint [huser@master name]$ rm -R * [huser@master name]$ ls
2、停止集群,然后重启集群,发现nameNode失败。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh [huser@master hadoop-1.2.1]$ bin/start-all.sh [huser@master hadoop-1.2.1]$ jps 7160 SecondaryNameNode 7229 JobTracker 7369 Jps
3、停止集群格式化namenode。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh [huser@master hadoop-1.2.1]$ bin/hadoop namenode -format 14/04/16 21:17:39 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/192.168.1.115 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_51 ************************************************************/ Re-format filesystem in /home/huser/hadoop/tmp/dfs/name ? (Y or N) Y 14/04/16 21:17:42 INFO util.GSet: Computing capacity for map BlocksMap 14/04/16 21:17:42 INFO util.GSet: VM type = 64-bit 14/04/16 21:17:42 INFO util.GSet: 2.0% max memory = 1013645312 14/04/16 21:17:42 INFO util.GSet: capacity = 2^21 = 2097152 entries 14/04/16 21:17:42 INFO util.GSet: recommended=2097152, actual=2097152 14/04/16 21:17:43 INFO namenode.FSNamesystem: fsOwner=huser 14/04/16 21:17:43 INFO namenode.FSNamesystem: supergroup=supergroup 14/04/16 21:17:43 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/04/16 21:17:43 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 14/04/16 21:17:43 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 14/04/16 21:17:43 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 14/04/16 21:17:43 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/04/16 21:17:43 INFO common.Storage: Image file /home/huser/hadoop/tmp/dfs/name/current/fsimage of size 111 bytes saved in 0 seconds. 14/04/16 21:17:43 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits 14/04/16 21:17:43 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits 14/04/16 21:17:44 INFO common.Storage: Storage directory /home/huser/hadoop/tmp/dfs/name has been successfully formatted. 14/04/16 21:17:44 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.115 ************************************************************/
4、从datanode节点获取namespace的ID。
[huser@master hadoop-1.2.1]$ ssh slave1 [huser@slave1 current]$ pwd /home/huser/hadoop/tmp/dfs/data/current [huser@slave1 current]$ ll -rw-rw-r-- 1 huser huser 49184 4月 16 18:43 blk_-1800088935645150399 -rw-rw-r-- 1 huser huser 395 4月 16 18:43 blk_-1800088935645150399_1013.meta -rw-rw-r-- 1 huser huser 25 4月 16 18:43 blk_269963827714855400 -rw-rw-r-- 1 huser huser 11 4月 16 18:43 blk_269963827714855400_1014.meta -rw-rw-r-- 1 huser huser 16353 4月 16 18:43 blk_4611281727215307463 -rw-rw-r-- 1 huser huser 135 4月 16 18:43 blk_4611281727215307463_1015.meta -rw-rw-r-- 1 huser huser 769 4月 16 19:32 dncp_block_verification.log.curr -rw-rw-r-- 1 huser huser 158 4月 16 19:51 VERSION [huser@slave1 current]$ cat VERSION #Wed Apr 16 19:51:23 CST 2014 namespaceID=589801292 storageID=DS-1065963269-192.168.1.111-50010-1397640950581 cTime=0 storageType=DATA_NODE layoutVersion=-41
5、修改namenode的VERSION文件中namespaceID。
[huser@slave1 current]$ exit logout [huser@master current]$ pwd /home/huser/hadoop/tmp/dfs/name/current [huser@master current]$ vi VERSION #Wed Apr 16 21:17:43 CST 2014 namespaceID=589801292 cTime=0 storageType=NAME_NODE layoutVersion=-41
6、删除namenode节点下的fsinage文件。
[huser@master current]$ rm fsimage [huser@master current]$ ll -rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits -rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime -rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION
7、复制secondarynamenode节点的fsimage文件到namenode节点下。
[huser@master current]$ pwd /home/huser/hadoop/tmp/dfs/namesecondary/current [huser@master current]$ ll -rw-rw-r-- 1 huser huser 4 4月 16 20:16 edits -rw-rw-r-- 1 huser huser 2259 4月 16 20:16 fsimage -rw-rw-r-- 1 huser huser 8 4月 16 20:16 fstime -rw-rw-r-- 1 huser huser 100 4月 16 20:16 VERSION [huser@master current]$ cp fsimage /home/huser/hadoop/tmp/dfs/name/current/ [huser@master current]$ cd /home/huser/hadoop/tmp/dfs/name/current/ [huser@master current]$ ll -rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits -rw-rw-r-- 1 huser huser 2259 4月 16 21:37 fsimage -rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime -rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION
8、重启集群并检查运行情况。
[huser@master hadoop-1.2.1]$ jps 7927 SecondaryNameNode 7773 NameNode 8017 JobTracker 8123 Jps