• Hadoop常见异常及其解决方案 分类: A1_HADOOP 2014-07-09 15:02 4187人阅读 评论(0) 收藏



    1、Shell$ExitCodeException

    现象:运行hadoop job时出现如下异常:

    14/07/09 14:42:50 INFO mapreduce.Job: Task Id : attempt_1404886826875_0007_m_000000_1, Status : FAILED
    Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
    org.apache.hadoop.util.Shell$ExitCodeException: 
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
            at org.apache.hadoop.util.Shell.run(Shell.java:418)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
            at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)

    Container exited with a non-zero exit code 1

    原因及解决办法:原因未知。重启可恢复正常


    2、libhadoop.so.1.0.0 which might have disabled stack guard

    现象:Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

    原因及解决方法:
    在/etc/profile中添加:
    #hadoop configuration
    export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
    export HADOOP_HOME=/home/jediael/hadoop-2.4.1
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_YARN_HOME=$HADOOP_HOME
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    此警告出现的原因是最后2项未添加。


    3、Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s)

    在datanode上执行hdfs相关命令时,出现以下错误:

    [jediael@slave156 ~]$ hadoop fs -ls /
    14/08/31 15:00:37 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:38 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:39 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:40 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:41 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:42 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:43 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:44 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:45 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/08/31 15:00:46 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    ls: Call to master166/10.252.48.166:9000 failed on connection exception: java.net.ConnectException: Connection refused

    出现以上错误,通常都是由于datanode无法连接到namenode所致,以下是一种情况:

    /etc/hosts中存在127.0.0.1 *****的配置,如

    127.0.0.1 localhost

    将这些配置去掉,然后重新格式化namenode,并重启hadoop进程即可解决。

    或者是以下原因:

    hadoop安装完成后,必须要用haddop namenode format格式化后,才能使用,如果重启机器

    在启动hadoop后,用hadoop fs -ls命令老是报 10/09/25 18:35:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).的错误,

    用jps命令,也看不不到namenode的进程, 必须再用命令hadoop namenode format格式化后,才能再使用

        原因是:hadoop默认配置是把一些tmp文件放在/tmp目录下,重启系统后,tmp目录下的东西被清除,所以报错

        解决方法:在conf/core-site.xml 中增加以下内容

       <property>

       <name>hadoop.tmp.dir</name>

       <value>/var/log/hadoop/tmp</value>

      <description>A base for other temporary directories</description>

      </property>

      重启hadoop后,格式化namenode即可


    4、Permission denied: user=liaoliuqing, access=WRITE, inode="":jediael:supergroup:rwxr-xr-x

    原因为用户权限不足,能能访写HDFS中的文件。

    解决方案:

    关闭hadoop权限,在hdfs-site.xml文件中添加

    <property>    

    <name>dfs.permissions</name>    

    <value>false</value>    

    </property>


    5、Incompatible namespaceIDs

    2015-02-02 15:10:57,526 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
    2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
    2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
    2015-02-02 15:10:57,544 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
    2015-02-02 15:10:57,699 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
    2015-02-02 15:10:58,090 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /mnt/tmphadoop/dfs/data: namenode namespaceID = 2017454015; datanode namespaceID = 1238467850
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
            at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
            at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

    问题原因:
    每次namenode format会重新创建一个namenodeId,而${hadoop.tmp.dir}/dfs/data下包含了上次format下的id,当重新执行namenode format时清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致,从而导致从现上述异常,启动失败。


    解决办法:
    (1)停止hadoop
     stop-all.sh
    (2)在各个slave中删除dfs.data.dir中的内容。若此属性未修改,则其默认值为
    <property>
      <name>${dfs.data.dir}</name>
      <value>${hadoop.tmp.dir}/dfs/data</value>
      <description>Determines where on the local filesystem an DFS data node
      should store its blocks.  If this is a comma-delimited
      list of directories, then data will be stored in all named
      directories, typically on different devices.
      Directories that do not exist are ignored.
      </description>
    </property>
    (3)重新格式化namenode
    hadoop namenode -format
    然后start-all.sh启动hadoop

    以上解决办法需要将原有数据删除,若数据不能删除,则使用以下方法之一:
    (1)修改${dfs.data.dir}/current/VERSION文件,将datanode中的id改成与namenode中的id一致。
    (2)修改${dfs.data.dir}







    版权声明:本文为博主原创文章,未经博主允许不得转载。

  • 相关阅读:
    Windows网络编程:多线程技术
    Windows网络编程:OSI七层模型
    Windows网络编程:WinSock模型
    Windows网络编程:基于Scoket最简单的CS
    Windows网络编程:同步/异步 阻塞/非阻塞
    1.WebGL:简介
    无聊的面试啊:2020
    第一次面试
    实习第三周
    Eclipse新建web项目
  • 原文地址:https://www.cnblogs.com/lujinhong2/p/4637289.html
Copyright © 2020-2023  润新知