全局变量
写MapReduce程序时候,有时候须要用到全局变量,经常使用的全局变量实现由三种方式:
- 通过作业的Configuration传递全局变量。作业初始化的时候。conf.set(),须要的时候。再用conf.get()读出来。缺点:不能共享较大的数据。
- 通过distributedcache
- 通过HDFS实现:即将全局变量写入一个文件,须要的时候,从该文件读取出来
发现问题
全局变量的代码设置例如以下,在Mapper中通过Configuration无法读出配置"deadline"。
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
//job.getCluster().getClusterStatus().getMapSlotCapacity();
conf.set("deadline", new Date().toString);
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
解决这个问题
但是同事的代码却能够,将代码粘贴出来
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.getConfiguration().set("deadline", new Date().toString());
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
或者public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } conf.set("deadline", new Date().toString()); Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ?
0 : 1); }
问题分析
跟踪代码:
Job job = new Job(conf, "word count");
到 @Deprecated
public Job(Configuration conf, String jobName) throws IOException {
this(conf);
setJobName(jobName);
}
到 @Deprecated
public Job(Configuration conf) throws IOException {
this(new JobConf(conf));
}
这样,Job里面的conf和main()里面的conf已经不一样了,故导致问题总结
Configuration全局变量没设置成功的原因:设置參数的Configuration和读取參数的Configuration不一致。