• Hadoop学习记录(7)|Eclipse远程调试Hadoop


    1、创建Hadoop项目

    image

    image

    2、创建包、类

    这里使用hdfs.WordCount为例

    image

    3、编写自定Mapper和Reducer程序

    MyMapper类

    static class MyMapper extends
    			Mapper<LongWritable, Text, Text, LongWritable> {
    
    	@Override
    	protected void map(LongWritable k1, Text v1, Context context)
    			throws IOException, InterruptedException {
    		// 对内容进行分词处理存到字符数组内
    		StringTokenizer tokenizer = new StringTokenizer(v1.toString());
    		// 创建Text k2
    		Text k2 = new Text();
    		// 遍历写入context中
    		while (tokenizer.hasMoreTokens()) {
    			k2.set(tokenizer.nextToken());
    			context.write(k2, new LongWritable(1));
    		}
    
    	}
    
    }</pre></div>
    

    Reducer类

    static class MyReducer extends
    			Reducer<Text, LongWritable, Text, LongWritable> {
    		@Override
    		protected void reduce(Text k2, Iterable<LongWritable> v2s,
    				Context context) throws IOException, InterruptedException {
    			long sum = 0;
    			for(LongWritable val : v2s){
    				sum += val.get();
    			}
    			context.write(k2, new LongWritable(sum));
    		}
    	}
    

    编写main驱动方法

    public static void main(String[] args) throws Exception {
    
    	if(args.length != 2){
    		System.err.print("Usage:wordcount");
    		System.exit(2);
    	}
    	
    	Configuration conf = new Configuration();
    	Job job = new Job(conf,WordCount.class.getSimpleName());
    	//用eclipse插件运行相当于是jar包运行
    	job.setJarByClass(WordCount.class);
    	//设置mapper
    	job.setMapperClass(MyMapper.class);
    	//设置map输出k2的类型
    	job.setMapOutputKeyClass(Text.class);
    	//设置map输出v2的类型
    	job.setMapOutputValueClass(LongWritable.class);
    	//设置分区类
    	job.setPartitionerClass(HashPartitioner.class);
    	//设置作业数量
    	job.setNumReduceTasks(1);
    	//设置reducer类
    	job.setReducerClass(MyReducer.class);
    	//设置输出的格式
    	job.setOutputFormatClass(TextOutputFormat.class);
    	//设置k3的输出类型
    	job.setOutputKeyClass(Text.class);
    	//设置v3的输出类型
    	job.setOutputValueClass(LongWritable.class);
    	
    	//这里是从外面传入参数
    	FileInputFormat.setInputPaths(job, new Path(args[0]));
    	FileOutputFormat.setOutputPath(job, new Path(args[1]));
    	//提交任务,如果返回false代表有异常,使用system.exit结束java虚拟机,如果没问题返回0正常执行.
    	System.exit(job.waitForCompletion(true)?0:1);
    	
    	
    }</pre></div>
    

    4、运行mapreduce程序远程调用hadoop。

    先配置访问路径

    image

    写hdfs访问路径。

    image

    现在使用Run as—Run on hadoop会出现一个错误

    14/03/11 15:58:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/03/11 15:58:22 ERROR security.UserGroupInformation: PriviledgedActionException as:Sky cause:java.io.IOException: Failed to set permissions of path: 	mphadoop-SkymapredstagingSky1823204560.staging to 0700
    Exception in thread "main" java.io.IOException: Failed to set permissions of path: 	mphadoop-SkymapredstagingSky1823204560.staging to 0700
    	at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
    	at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
    	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    	at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    	at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:415)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    	at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    	at hdfs.WordCount.main(WordCount.java:58)

    这个是windows下的权限问题,在linux上运行时正常的。

    解决方法:

    打开F:SoftwareHadoophadoop-1.1.2srccoreorgapachehadoopfsFileUtil.java

    注释checkReturnValue函数中的内容,保存即可!

    image

    再运行时正常输出计算器,并生成了新的目录。输出目录不能存在,由hadoop自动创建完成!

    14/03/11 16:08:40 INFO mapred.JobClient:  map 100% reduce 100%
    14/03/11 16:08:41 INFO mapred.JobClient: Job complete: job_local_0001
    14/03/11 16:08:41 INFO mapred.JobClient: Counters: 19
    14/03/11 16:08:41 INFO mapred.JobClient:   File Output Format Counters
    14/03/11 16:08:41 INFO mapred.JobClient:     Bytes Written=2154020
    14/03/11 16:08:41 INFO mapred.JobClient:   FileSystemCounters
    14/03/11 16:08:41 INFO mapred.JobClient:     FILE_BYTES_READ=631320575
    14/03/11 16:08:41 INFO mapred.JobClient:     HDFS_BYTES_READ=141910490
    14/03/11 16:08:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=774430506
    14/03/11 16:08:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2154020
    14/03/11 16:08:41 INFO mapred.JobClient:   File Input Format Counters
    14/03/11 16:08:41 INFO mapred.JobClient:     Bytes Read=70955245
    14/03/11 16:08:41 INFO mapred.JobClient:   Map-Reduce Framework
    14/03/11 16:08:41 INFO mapred.JobClient:     Reduce input groups=59150
    14/03/11 16:08:41 INFO mapred.JobClient:     Map output materialized bytes=142981973
    14/03/11 16:08:41 INFO mapred.JobClient:     Combine output records=0
    14/03/11 16:08:41 INFO mapred.JobClient:     Map input records=255015
    14/03/11 16:08:41 INFO mapred.JobClient:     Reduce shuffle bytes=0
    14/03/11 16:08:41 INFO mapred.JobClient:     Reduce output records=59150
    14/03/11 16:08:41 INFO mapred.JobClient:     Spilled Records=26709860
    14/03/11 16:08:41 INFO mapred.JobClient:     Map output bytes=128572984
    14/03/11 16:08:41 INFO mapred.JobClient:     Total committed heap usage (bytes)=305004544
    14/03/11 16:08:41 INFO mapred.JobClient:     Combine input records=0
    14/03/11 16:08:41 INFO mapred.JobClient:     Map output records=7201751
    14/03/11 16:08:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=99
    14/03/11 16:08:41 INFO mapred.JobClient:     Reduce input records=7201751

  • 相关阅读:
    Apache Tomcat 6.0 Tomcat6 服务因 1 (0x1) 服务特定错误而停止
    PaodingAnalysis 提示 "dic home should not be a file, but a directory"
    mappingDirectoryLocations
    多级反向代理下,Java获取请求客户端的真实IP地址多中方法整合
    java.util.ResourceBundle
    JSP验证码
    Error: [ng:areq] Argument 'LoginCtrl' is not a function, got undefined
    《横向领导力》笔记
    Java执行定时任务
    2017第43周三
  • 原文地址:https://www.cnblogs.com/luguoyuanf/p/3594242.html
Copyright © 2020-2023  润新知