• Windows或者Linux本地提交mapreduce上yarn坑记


    local模式运行
    这种方式访问本地文件,完全在本地执行,与集群无关,适用于开发阶段逻辑验证。

    yarn模式运行
    这种方式是真正的集群方式运行,将程序打成jar包上传到集群服务器上执行hadoop jar命令执行。生产环境使用。

    Windows或者Linux上远程提交到yarn运行
    这种方式是在Windows或者Linux上将jar包提交到集群中执行,提交jar包的主机无需安装hadoop集群,做法如下:

    1、将如下配置文件拷贝到项目的resources目录下

    core-site.xml
    hdfs-site.xml
    mapred-site.xml
    yarn-site.xml

    2、代码中指定执行的jar包

    job.setJar("G:\idea_workspace\MapReduce\out\artifacts\MapReduce_jar\MapReduce.jar");
    3、如果是windows环境需要配置跨平台

    两种方式,第一种方式在程序中加入如下代码:

    Configuration configuration=new Configuration();
    configuration.set("mapreduce.app-submission.cross-platform","true");
    另一种方式是在mapred-site.xml配置文件中添加如下配置

    <property>
    <name>mapreduce.app-submission.cross-platform</name>
    <value>true</value>
    </property>
    过程中可能会出现如下问题:

    (1)问题1

    2021-02-22 20:24:16,478 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at single/192.168.128.11:8032
    2021-02-22 20:24:17,003 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    2021-02-22 20:24:17,185 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
    2021-02-22 20:24:17,236 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
    2021-02-22 20:24:17,306 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_1608473235348_0006
    2021-02-22 20:24:17,730 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1608473235348_0006
    2021-02-22 20:24:17,769 INFO  [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://single:8088/proxy/application_1608473235348_0006/
    2021-02-22 20:24:17,769 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_1608473235348_0006
    2021-02-22 20:24:25,870 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_1608473235348_0006 running in uber mode : false
    2021-02-22 20:24:25,872 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
    2021-02-22 20:24:25,885 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Job job_1608473235348_0006 failed with state FAILED due to: Application application_1608473235348_0006 failed 2 times due to AM Container for appattempt_1608473235348_0006_000002 exited with  exitCode: 1
    For more detailed output, check application tracking page:http://single:8088/cluster/app/application_1608473235348_0006Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1608473235348_0006_02_000001
    Exit code: 1
    Exception message: /bin/bash: line 0: fg: no job control

    Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
        at org.apache.hadoop.util.Shell.run(Shell.java:479)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


    Container exited with a non-zero exit code 1
    Failing this attempt. Failing the application.
    2021-02-22 20:24:25,901 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 0

    问题1解决方法:

    mapred-site.xml添加如下配置

    <property>
    <name>mapreduce.app-submission.cross-platform</name>
    <value>true</value>
    </property>
    或者代码中

    conf.set("mapreduce.app-submission.cross-platform","true");

    (2)问题:2

    2021-02-22 20:30:34,703 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at single/192.168.128.11:8032
    2021-02-22 20:30:35,206 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    2021-02-22 20:30:35,221 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
    2021-02-22 20:30:35,229 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
    2021-02-22 20:30:35,425 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
    2021-02-22 20:30:35,500 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_1608473235348_0007
    2021-02-22 20:30:35,607 INFO  [main] mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(371)) - Job jar is not present. Not adding any jar to the list of resources.
    2021-02-22 20:30:35,646 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1608473235348_0007
    2021-02-22 20:30:35,673 INFO  [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://single:8088/proxy/application_1608473235348_0007/
    2021-02-22 20:30:35,673 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_1608473235348_0007
    2021-02-22 20:31:11,316 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_1608473235348_0007 running in uber mode : false
    2021-02-22 20:31:11,319 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
    2021-02-22 20:31:25,813 INFO  [main] mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : attempt_1608473235348_0007_m_000000_0, Status : FAILED
    Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.leboop.www.wordcount.WordCountMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: java.lang.ClassNotFoundException: Class com.leboop.www.wordcount.WordCountMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

    问题2解决方法:代码中指定jar包路径

    如下:

    job.setJar("G:\idea_workspace\MapReduce\MapReduce.jar");
    注意MapReduce.jar必须是添加了如上代码后的jar包。
     

    (3)问题3

    2021-02-22 21:22:18,957 WARN  [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
    2021-02-22 21:22:21,418 INFO  [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
    2021-02-22 21:22:21,542 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
    Exception in thread "main" java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path
        at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:443)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
        at com.leboop.www.wordcount.WordCountMain.main(WordCountMain.java:42)
    Caused by: java.net.URISyntaxException: Illegal character in path at index 11: /hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework
        at java.net.URI$Parser.fail(URI.java:2848)
        at java.net.URI$Parser.checkChars(URI.java:3021)
        at java.net.URI$Parser.parseHierarchical(URI.java:3105)
        at java.net.URI$Parser.parse(URI.java:3063)
        at java.net.URI.<init>(URI.java:588)
        at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:441)
        ... 9 more

    问题3解决方法:mapred-site.xml

    <property>
    <name>mapreduce.application.framework.path</name>
    <value>/hdp/apps/2.6.3.0-235/mapreduce/mapreduce.tar.gz#mr-framework</value>
    </property>
    (4)问题4

    2021-02-22 21:25:23,677 WARN  [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
    2021-02-22 21:25:24,633 INFO  [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
    2021-02-22 21:25:24,643 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
    Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /user/root/.staging. Name node is in safe mode.
    The reported blocks 0 needs additional 47 blocks to reach the threshold 1.0000 of total blocks 46.
    The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

    问题4解决方法:关闭安全模式

    hdfs dfsadmin -safemode leave

    (5)问题5

    2021-02-22 21:30:09,623 WARN  [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
    2021-02-22 21:30:10,492 INFO  [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
    2021-02-22 21:30:10,502 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
    Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/.staging":hdfs:hdfs:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1956)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1940)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1923)

    问题5解决办法:权限问题

    如果是在window上,设置系统变量HADOOP_USER_NAME=hdfs

    用户名具体是哪个根据实际情况设置。可参见《Java API 操作HDFS权限问题》

    ————————————————
    版权声明:本文为CSDN博主「leboop-L」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/L_15156024189/article/details/113954410

  • 相关阅读:
    QGroundControl编译出错记录
    【Luogu】【关卡2-5】字符串处理(2017年10月)
    【Luogu】【关卡2-4】排序Ex(2017年10月)
    【Luogu】【关卡2-3】排序(2017年10月) 【AK】
    【Luogu】【关卡2-2】交叉模拟(2017年10月)
    【Luogu】【关卡2-1】简单的模拟(2017年10月)
    【基础】图论基础 2017/04/20
    【LeetCode】BFS || DFS [2017.04.10--2017.04.17]
    【LeetCode】排序
    【LeetCode】贪心
  • 原文地址:https://www.cnblogs.com/javalinux/p/14927470.html
Copyright © 2020-2023  润新知