• Hadoop(1)运行Hadoop自带的wordcount出错问题。 北漂


        在hadoop2.9.0版本中,对namenode、yarn做了ha,随后在某一台namenode节点上运行自带的wordcount程序出现偶发性的错误(有时成功,有时失败),错误信息如下:

    18/08/16 17:02:42 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
    18/08/16 17:02:42 INFO input.FileInputFormat: Total input files to process : 1
    18/08/16 17:02:42 INFO mapreduce.JobSubmitter: number of splits:1
    18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
    18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
    18/08/16 17:02:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534406793739_0005
    18/08/16 17:02:42 INFO impl.YarnClientImpl: Submitted application application_1534406793739_0005
    18/08/16 17:02:43 INFO mapreduce.Job: The url to track the job: http://HLJRslog2:8088/proxy/application_1534406793739_0005/
    18/08/16 17:02:43 INFO mapreduce.Job: Running job: job_1534406793739_0005
    18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 running in uber mode : false
    18/08/16 17:02:54 INFO mapreduce.Job: map 0% reduce 0%
    18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 failed with state FAILED due to: Application application_1534406793739_0005 failed 2 times due to AM Container for appattempt_1534406793739_0005_000002 exited with exitCode: 1
    Failing this attempt.Diagnostics: [2018-08-16 17:02:48.561]Exception from container-launch.
    Container id: container_e27_1534406793739_0005_02_000001
    Exit code: 1
    [2018-08-16 17:02:48.562]
    [2018-08-16 17:02:48.574]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    
    [2018-08-16 17:02:48.575]
    [2018-08-16 17:02:48.575]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

    分析与解决:

    网上对类似问题解决办法,主要就是添加对应的classpath,测试了一遍,都不好使,说明上诉问题并不是classpath造成的,出错的时候也查看了classpath,都有对应的值,这里贴一下添加classpath的方法。

    1、# yarn classpath    注:查看对应的classpath的值

    /data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/share/hadoop/common/lib/*:/data1/hadoop/hadoop/share/hadoop/common/*:/data1/hadoop/hadoop/share/hadoop/hdfs:/data1/hadoop/hadoop/share/hadoop/hdfs/lib/*:/data1/hadoop/hadoop/share/hadoop/hdfs/*:/data1/hadoop/hadoop/share/hadoop/yarn:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/*:/data1/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*

    如果是上述类变量为空,可以通过下面三个步骤添加classpath。

    2.修改mapred.site.xml

    添加:

    <property> 
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>

    3.yarn.site.xml

    添加:

    <property>
        <name>yarn.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>

     4.修改环境变量

    #vim ~/.bashrc

    在文件最后添加下述环境变量:

    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_YARN_HOME=$HADOOP_HOME
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

     5. source ~/.bashrc

    解决报错问题:

    从日志可以看出,发现是由于跑AM的container退出了,并没有为任务去RM获取资源,怀疑是AM和RM通信有问题;一台是备RM,一台活动的RM,在yarn内部,当MR去活动的RM为任务获取资源的时候当然没问题,但是去备RM获取时就会出现这个问题了。

    修改vim yarn-site.xml

    <property>
    <!-- 客户端通过该地址向RM提交对应用程序操作 -->
    <name>yarn.resourcemanager.address.rm1</name>
    <value>master:8032</value>
    </property>
    <property>
    <!--ResourceManager 对ApplicationMaster暴露的访问地址。ApplicationMaster通过该地址向RM申请资源、释放资源等。 -->
    <name>yarn.resourcemanager.scheduler.address.rm1</name>  
    <value>master:8030</value>
    </property>
    <property>
    <!-- RM HTTP访问地址,查看集群信息-->
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>master:8088</value>
    </property>
    <property>
    <!-- NodeManager通过该地址交换信息 -->
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>master:8031</value>
    </property>
    <property>
    <!--管理员通过该地址向RM发送管理命令 -->
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>master:8033</value>
    </property>
    <property>
    <name>yarn.resourcemanager.ha.admin.address.rm1</name>
    <value>master:23142</value>
    </property>
    <!--
    <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>slave1:8032</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>slave1:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>slave1:8088</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>slave1:8031</value>
    </property>
    <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>slave1:8033</value>
    </property>
    <property>
    <name>yarn.resourcemanager.ha.admin.address.rm2</name>
    <value>slave1:23142</value>
    </property>
    -->

    注:标红的地方就是AM向RM申请资源的rpc端口,出错问题就在这里。

     

           红框里面是我在rm1机器(也就是master)上的yarn文件添加的;当然,如果是在slave1里面添加的话就是添加红框上面以.rm1结尾的那几行,其实,说白点,就是要在yarn-site.xml这个配置文件里面添加所有resourcemanager机器的通信主机与端口。然后拷贝到其他机器,重新启动yarn。最后在跑wordcount或者其他程序没在出错。其实这就是由于MR与RM通信的问题,所以在配置yarn-site.xml文件的时候,最好把主备的通信端口都配置到改文件,防止出错。

  • 相关阅读:
    nyoj 409——郁闷的C小加(三)——————【中缀式化前缀后缀并求值】
    中缀表达式转后缀表达式和前缀表达式
    Zoj 3870——Team Formation——————【技巧,规律】
    BNU4286——Adjacent Bit Counts——————【dp】
    BNU7538——Clickomania——————【区间dp】
    BNU4299——God Save the i-th Queen——————【皇后攻击,找到对应关系压缩空间】
    HDU 2795——Billboard——————【单点更新、求最小位置】
    HDU 4027—— Can you answer these queries?——————【线段树区间开方,区间求和】
    BNU34067——Pair——————【找规律】
    telnet 命令使用方法详解
  • 原文地址:https://www.cnblogs.com/yjt1993/p/9489122.html
Copyright © 2020-2023  润新知