• hive的hiveserver2模式启动不起来,发现Hadoop一直处于安全模式


    hive的hiveserver2模式启动不起来,发现Hadoop一直处于安全模式

    命令介绍


    命令hadoop fs –safemode get 查看安全模式状态
    命令hadoop fs –safemode enter 进入安全模式状态
    命令hadoop fs –safemode leave 离开安全模式状态

    用Hadoop fsck查看破坏丢失的文件位置

    hadoop  fsck
    
    Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
            <path>             检查这个目录中的文件是否完整
    
            -move               破损的文件移至/lost+found目录
            -delete             删除破损的文件
    
            -openforwrite   打印正在打开写操作的文件
    
            -files                 打印正在check的文件名
    
            -blocks             打印block报告 (需要和-files参数一起使用)
    
            -locations         打印每个block的位置信息(需要和-files参数一起使用)
    
            -racks               打印位置信息的网络拓扑图 (需要和-files参数一起使用)
    

    第一步:检查hadoop文件系统hadoop fsck /

    [root@node03 export]# hadoop fsck /
    ....................................................................................................
    .............Status: CORRUPT					#Hadoop状态:不正常
     Total size:	273821489 B
     Total dirs:	403
     Total files:	213
     Total symlinks:		0
     Total blocks (validated):	201 (avg. block size 1362295 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:	2 (0.99502486 %)
      dfs.namenode.replication.min:	1
      CORRUPT FILES:	2							#损坏了两个文件
      MISSING BLOCKS:	2							#丢失了两个块
      MISSING SIZE:		6174 B
      CORRUPT BLOCKS: 	2
      ********************************
     Minimally replicated blocks:	199 (99.004974 %)
     Over-replicated blocks:	0 (0.0 %)
     Under-replicated blocks:	0 (0.0 %)
     Mis-replicated blocks:		0 (0.0 %)
     Default replication factor:	3
     Average block replication:	2.8208954
     Corrupt blocks:		2
     Missing replicas:		0 (0.0 %)
     Number of data-nodes:		3
     Number of racks:		1
    FSCK ended at Fri Aug 23 10:43:11 CST 2019 in 12 milliseconds
    

    看到这些代表hadoop集群不正常,有文件丢失:

    ​ .............Status: CORRUPT #Hadoop状态:不正常

    CORRUPT FILES: 2 #损坏了两个文件
    MISSING BLOCKS: 2 #丢失了两个块

    第二步:将hadoop文件状态信息打印到文件中

    内容太多,截取了一部分信息

    hadoop fsck / -files -blocks -locations -racks >/export/missingFile.txt 将检查到的内容打印到/export/missingFile.txt文件中

    [root@node03 export]# hadoop fsck /  -files -blocks -locations  -racks >/export/missingFile.txt
    
    
    /flink-checkpoint/11748bc079799f330078967fbf018a48/chk-74/_metadata 452 bytes, 1 block(s):  OK
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742825_2005 len=452 Live_repl=1 [/default-rack/192.168.52.110:50010]
    
    /flink-checkpoint/11748bc079799f330078967fbf018a48/shared <dir>
    /flink-checkpoint/11748bc079799f330078967fbf018a48/taskowned <dir>
    /flink-checkpoint/42d81db182771fe71932120fa8933612 <dir>
    /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950 <dir>
    /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950/_metadata 337 bytes, 1 block(s):  OK
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073745657_4837 len=337 Live_repl=1 [/default-rack/192.168.52.120:50010]
    
    /flink-checkpoint/42d81db182771fe71932120fa8933612/chk-950/f59c63a0-a35d-4d4b-8e73-72c2aa1dd383 5657 bytes, 1 block(s):  OK
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073745656_4836 len=5657 Live_repl=1 [/default-rack/192.168.52.100:50010]
    
    /flink-checkpoint/42d81db182771fe71932120fa8933612/shared <dir>
    /flink-checkpoint/42d81db182771fe71932120fa8933612/taskowned <dir>
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01 <dir>
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9 <dir>
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9/_metadata 451 bytes, 1 block(s):  OK
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742843_2023 len=451 Live_repl=1 [/default-rack/192.168.52.100:50010]
    
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/chk-9/c58c8c49-8782-41b4-a3df-2fa7ff1d1eba 5663 bytes, 1 block(s):  OK
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073742842_2022 len=5663 Live_repl=1 [/default-rack/192.168.52.120:50010]
    
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/shared <dir>
    /flink-checkpoint/50aebc9e7aac85fd33bff905972a6e01/taskowned <dir>
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995 <dir>
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175 <dir>
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 5663 bytes, 1 block(s): 
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743749
     MISSING 1 blocks of total size 5663 B
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073743749_2929 len=5663 MISSING!
    
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata 511 bytes, 1 block(s): 
    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743750
     MISSING 1 blocks of total size 511 B
    0. BP-2135962035-192.168.52.100-1562110398602:blk_1073743750_2930 len=511 MISSING!
    

    可以看到正常文件后面都有ok字样,有MISSING!字样的就是丢失的文件。

    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743749
    MISSING 1 blocks of total size 5663 B

    /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata: CORRUPT blockpool BP-2135962035-192.168.52.100-1562110398602 block blk_1073743750
    MISSING 1 blocks of total size 511 B

    根据这个的路劲可以在hadoop浏览器界面中找到对应的文件路径,如下图:

    第三步:修复两个丢失、损坏的文件

    [root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 -retries 10

    [root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata -retries 10

    [root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 -retries 10
    recoverLease SUCCEEDED on /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080
    
    [root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata -retries 10
    recoverLease SUCCEEDED on /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata
    [root@node03 conf]# 
    

    可以看到:

    ...........Status: HEALTHY
     Total size:	273815315 B
     Total dirs:	403
     Total files:	211
     Total symlinks:		0
     Total blocks (validated):	199 (avg. block size 1375956 B)
     Minimally replicated blocks:	199 (100.0 %)
     Over-replicated blocks:	0 (0.0 %)
     Under-replicated blocks:	0 (0.0 %)
     Mis-replicated blocks:		0 (0.0 %)
     Default replication factor:	3
     Average block replication:	2.8492463
     Corrupt blocks:		0
     Missing replicas:		0 (0.0 %)
     Number of data-nodes:		3
     Number of racks:		1
    FSCK ended at Fri Aug 23 11:15:01 CST 2019 in 11 milliseconds
    

    ...........Status: HEALTHY 集群状态:健康

    现在重新启动hadoop就不会一直处于安全模式了,hiveserver2也能正常启动了。。

    第四:意外状况

    如果修复不了,或者提示修复成功但是集群状态还是下面这样:

    .............Status: CORRUPT					#Hadoop状态:不正常
     Total size:	273821489 B
     Total dirs:	403
     Total files:	213
     Total symlinks:		0
     Total blocks (validated):	201 (avg. block size 1362295 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:	2 (0.99502486 %)
      dfs.namenode.replication.min:	1
      CORRUPT FILES:	2							#损坏了两个文件
      MISSING BLOCKS:	2							#丢失了两个块
      MISSING SIZE:		6174 B
      CORRUPT BLOCKS: 	2
      ********************************
     Minimally replicated blocks:	199 (99.004974 %)
     Over-replicated blocks:	0 (0.0 %)
     Under-replicated blocks:	0 (0.0 %)
     Mis-replicated blocks:		0 (0.0 %)
     Default replication factor:	3
     Average block replication:	2.8208954
     Corrupt blocks:		2
     Missing replicas:		0 (0.0 %)
     Number of data-nodes:		3
     Number of racks:		1
    FSCK ended at Fri Aug 23 10:43:11 CST 2019 in 12 milliseconds
    

    1、如果损坏的文件不重要

    首先:将找到的损坏文件备份好

    然后:执行[root@node03 export]# hadoop fsck / -delete将损坏文件删除

    [root@node03 export]# hadoop fsck / -delete
    

    此命令一次不成功可以多试几次,前提是丢失、损坏的文件不重要!!!!!!!!!!

    2、如果损坏的文件很重要不能丢失

    可以先执行此命令:hadoop fs –safemode leave 强制离开安全模式状态

    [root@node03 export]# hadoop fs –safemode leave
    

    此操作不能完全解决问题,只能暂时让集群能够工作!!!!

    而且,以后每次启动hadoop集群都要执行此命令,直到问题彻底解决。

    如果并非以上问题请转这篇:
    https://www.cnblogs.com/-xiaoyu-/p/12158984.html

  • 相关阅读:
    ASP.NET Core 个人新闻项目
    C# 检查字符串中是否有HTML标签、返回过滤掉所有的HTML标签后的字符串
    VueCLI 页面加载进度条效果
    replace() 方法使用
    CentOS 7.9安装教程
    在Windows中安装MySQL
    linux安装consul
    jenkins Skywalking安装部署文档总结
    CentOS 7.x安装.NET运行时
    Apollo部署文档
  • 原文地址:https://www.cnblogs.com/-xiaoyu-/p/11399287.html
Copyright © 2020-2023  润新知