• MapReduce全局变量之捉虫记


    全局变量

    写MapReduce程序时候,有时候须要用到全局变量,经常使用的全局变量实现由三种方式:
    • 通过作业的Configuration传递全局变量。作业初始化的时候。conf.set(),须要的时候。再用conf.get()读出来。缺点:不能共享较大的数据。

    • 通过distributedcache
    • 通过HDFS实现:即将全局变量写入一个文件,须要的时候,从该文件读取出来

    发现问题

    全局变量的代码设置例如以下,在Mapper中通过Configuration无法读出配置"deadline"。
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }
        
        Job job = new Job(conf, "word count");
        //job.getCluster().getClusterStatus().getMapSlotCapacity();
        conf.set("deadline", new Date().toString);
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }

    解决这个问题

    但是同事的代码却能够,将代码粘贴出来
     public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }  
        Job job = new Job(conf, "word count");
        job.getConfiguration().set("deadline", new Date().toString()); 
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
    或者
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }
        conf.set("deadline", new Date().toString());    
        Job job = new Job(conf, "word count");  
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ?

    0 : 1); }

    问题分析

    跟踪代码:
    Job job = new Job(conf, "word count");
      @Deprecated
      public Job(Configuration conf, String jobName) throws IOException {
        this(conf);
        setJobName(jobName);
      }
      @Deprecated
      public Job(Configuration conf) throws IOException {
        this(new JobConf(conf));
      }
    这样,Job里面的conf和main()里面的conf已经不一样了,故导致问题

    总结

    Configuration全局变量没设置成功的原因:设置參数的Configuration和读取參数的Configuration不一致。


  • 相关阅读:
    Gitlab的安装
    转 Java操作PDF之iText详细入门
    ElasticSearch聚合(转)
    谷歌搜索技巧(转)https://www.runningcheese.com/google
    自学elastic search
    WinForm richtextbox 关键字变红色
    https采集12306(复制)
    LTS
    学习Android MediaPlayer
    UML基础知识
  • 原文地址:https://www.cnblogs.com/mqxnongmin/p/10746024.html
Copyright © 2020-2023  润新知