• hadoop输出控制,输出到指定文件中


    最近在研究将hadoop输出内容放到指定的文件夹中,

    (未完待续)

    以wordcount内容为例子:

    public class wordcount {
        public static class TokenizerMapper extends
                Mapper<Object, Text, Text, IntWritable>
        {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(Object key, Text value, Context context)
                    throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
       
       
        public static class IntSumReducer extends
                Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
           
            private MultipleOutputs<Text, IntWritable> mo;
           
            public void reduce(Text key, Iterable<IntWritable> values,
                    Context context) throws IOException, InterruptedException {
               
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
               

                mo = new MultipleOutputs<Text, IntWritable>(context);//context和MultipleOutputs是独立的,都进行了写功能,互不干扰
                //MultipleOutputs的write写到多个文件,但是文件之间不能覆盖
                Text kw= new Text("this a test!sum is:");
                IntWritable content= new IntWritable(sum);
                
                mo.write(kw, content, key.toString());//success,输出内容到输出目录out下的key.toString()文件中去。其内容全部分开,wordcount自身的context输出文件中包含全部内容,而MultipleOutputs在这里将他们分开写到不同的文件里面去。
                //mo.write(key, result, "error"+key.toString());//success
                //mo.write(key, result, "all");//testall.jar 有问题,因为all-r-00000生成一次后,不能覆盖
               
                //mo.write(key, result, null);//wrong!no file to write
                //mo.write(key, result, "/user/test");//unsuccess
                //mo.write(null, key, result, key.toString());

                //mo.write(key, result, "all");//unsuccess
                //mo.write(key.toString(), key, result);//unsuccess
                mo.close();
            }
        }
       
        public static void main(String[] args) throws Exception {
         Configuration conf = new Configuration();
       
         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
         if (otherArgs.length != 2) {
         System.err.println("Usage: wordcount <in> <out>");
         System.exit(2);
         }
        
        
         Job job = new Job(conf, "word count");
         job.setJarByClass(wordcount.class);
         job.setMapperClass(TokenizerMapper.class);
         job.setCombinerClass(IntSumReducer.class);
         job.setReducerClass(IntSumReducer.class);
         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(IntWritable.class);
    //     job.setOutputFormatClass(testOutputFormat.class)
        
         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        
         System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }

  • 相关阅读:
    jquery tabs插件
    [XPまつり2009LT]コンカツ女子のためのIT技術者の落とし方
    javascriptプログラマのレベル10
    IE中原生的base64支持
    shell脚本超时控制
    杜拉拉老了后
    常见c++笔试题整理(含答案)page26
    程序员编程艺术第二十六章:基于给定的文档生成倒排索引(含源码下载)
    程序员编程艺术第二十五章:Jon Bentley:90%无法正确实现二分查找
    80后富二代砍妻子20多刀,因为女的带佛牌戴的?
  • 原文地址:https://www.cnblogs.com/cl1024cl/p/6205688.html
Copyright © 2020-2023  润新知