local模式运行
这种方式访问本地文件,完全在本地执行,与集群无关,适用于开发阶段逻辑验证。
yarn模式运行
这种方式是真正的集群方式运行,将程序打成jar包上传到集群服务器上执行hadoop jar命令执行。生产环境使用。
Windows或者Linux上远程提交到yarn运行
这种方式是在Windows或者Linux上将jar包提交到集群中执行,提交jar包的主机无需安装hadoop集群,做法如下:
1、将如下配置文件拷贝到项目的resources目录下
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
2、代码中指定执行的jar包
job.setJar("G:\idea_workspace\MapReduce\out\artifacts\MapReduce_jar\MapReduce.jar");
3、如果是windows环境需要配置跨平台
两种方式,第一种方式在程序中加入如下代码:
Configuration configuration=new Configuration();
configuration.set("mapreduce.app-submission.cross-platform","true");
另一种方式是在mapred-site.xml配置文件中添加如下配置
<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
过程中可能会出现如下问题:
(1)问题1
2021-02-22 20:24:16,478 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at single/192.168.128.11:8032
2021-02-22 20:24:17,003 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2021-02-22 20:24:17,185 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2021-02-22 20:24:17,236 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
2021-02-22 20:24:17,306 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_1608473235348_0006
2021-02-22 20:24:17,730 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1608473235348_0006
2021-02-22 20:24:17,769 INFO [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://single:8088/proxy/application_1608473235348_0006/
2021-02-22 20:24:17,769 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_1608473235348_0006
2021-02-22 20:24:25,870 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_1608473235348_0006 running in uber mode : false
2021-02-22 20:24:25,872 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0%
2021-02-22 20:24:25,885 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Job job_1608473235348_0006 failed with state FAILED due to: Application application_1608473235348_0006 failed 2 times due to AM Container for appattempt_1608473235348_0006_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://single:8088/cluster/app/application_1608473235348_0006Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1608473235348_0006_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job control
Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2021-02-22 20:24:25,901 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 0
问题1解决方法:
mapred-site.xml添加如下配置
<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
或者代码中
conf.set("mapreduce.app-submission.cross-platform","true");
(2)问题:2
2021-02-22 20:30:34,703 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at single/192.168.128.11:8032
2021-02-22 20:30:35,206 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2021-02-22 20:30:35,221 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2021-02-22 20:30:35,229 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2021-02-22 20:30:35,425 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
2021-02-22 20:30:35,500 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_1608473235348_0007
2021-02-22 20:30:35,607 INFO [main] mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(371)) - Job jar is not present. Not adding any jar to the list of resources.
2021-02-22 20:30:35,646 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1608473235348_0007
2021-02-22 20:30:35,673 INFO [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://single:8088/proxy/application_1608473235348_0007/
2021-02-22 20:30:35,673 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_1608473235348_0007
2021-02-22 20:31:11,316 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_1608473235348_0007 running in uber mode : false
2021-02-22 20:31:11,319 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0%
2021-02-22 20:31:25,813 INFO [main] mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : attempt_1608473235348_0007_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.leboop.www.wordcount.WordCountMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.leboop.www.wordcount.WordCountMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 8 more
问题2解决方法:代码中指定jar包路径
如下:
job.setJar("G:\idea_workspace\MapReduce\MapReduce.jar");
注意MapReduce.jar必须是添加了如上代码后的jar包。
(3)问题3
2021-02-22 21:22:18,957 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
2021-02-22 21:22:21,418 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
2021-02-22 21:22:21,542 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
Exception in thread "main" java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path
at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:443)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at com.leboop.www.wordcount.WordCountMain.main(WordCountMain.java:42)
Caused by: java.net.URISyntaxException: Illegal character in path at index 11: /hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3063)
at java.net.URI.<init>(URI.java:588)
at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:441)
... 9 more
问题3解决方法:mapred-site.xml
<property>
<name>mapreduce.application.framework.path</name>
<value>/hdp/apps/2.6.3.0-235/mapreduce/mapreduce.tar.gz#mr-framework</value>
</property>
(4)问题4
2021-02-22 21:25:23,677 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
2021-02-22 21:25:24,633 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
2021-02-22 21:25:24,643 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /user/root/.staging. Name node is in safe mode.
The reported blocks 0 needs additional 47 blocks to reach the threshold 1.0000 of total blocks 46.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
问题4解决方法:关闭安全模式
hdfs dfsadmin -safemode leave
(5)问题5
2021-02-22 21:30:09,623 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
2021-02-22 21:30:10,492 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://hdp22:8188/ws/v1/timeline/
2021-02-22 21:30:10,502 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp22/192.168.128.22:8050
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/.staging":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1956)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1940)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1923)
问题5解决办法:权限问题
如果是在window上,设置系统变量HADOOP_USER_NAME=hdfs
用户名具体是哪个根据实际情况设置。可参见《Java API 操作HDFS权限问题》
————————————————
版权声明:本文为CSDN博主「leboop-L」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/L_15156024189/article/details/113954410