• MapReduce ----数据去重


    三个文件

    2017-03-10 a
    2017-03-11 b
    2017-03-12 d
    2017-03-13 d
    2017-03-14
    2017-03-15 a

    2017-03-10 e
    2017-03-11 b
    2017-03-12 c
    2017-03-13
    2017-03-14 h
    2017-03-15 a
    2017-03-17 p

    2017-03-10
    2017-03-11 b
    2017-03-12
    2017-03-13 d
    2017-03-14
    2017-03-15 f
    2017-03-16 o


    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class Deup{
        public static class Map extends Mapper<Object, Text, Text, Text>{
            @Override
            protected void map(Object key, Text value, Mapper<Object, Text, Text, Text>.Context context)
                    throws IOException, InterruptedException {
                context.write(value, new Text(""));
            }
        }
        
        public static class Reduce extends Reducer<Text, Text, Text, Text>{
            @Override
            protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)
                    throws IOException, InterruptedException {
                context.write(key, new Text(""));
            }
        }
        public static void main(String[] args) throws Exception {
            Configuration conf=new Configuration();
            @SuppressWarnings("deprecation")
            Job job=new Job(conf);
            job.setJarByClass(Deup.class);
            job.setMapperClass(Map.class);
            job.setCombinerClass(Reduce.class);
            job.setReducerClass(Reduce.class);
           job.setOutputKeyClass(Text.class);
           job.setOutputValueClass(Text.class);  
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            System.exit(job.waitForCompletion(true)?0:1);
            
        }
    }

    结果

    2017-03-10    
    2017-03-10 a    
    2017-03-10 e    
    2017-03-11 b    
    2017-03-12    
    2017-03-12 c    
    2017-03-12 d    
    2017-03-13    
    2017-03-13 d    
    2017-03-14    
    2017-03-14 h    
    2017-03-15 a    
    2017-03-15 f    
    2017-03-16 o    
    2017-03-17 p   

  • 相关阅读:
    【BZOJ3784】树上的路径 点分治序+ST表
    【BZOJ3698】XWW的难题 有上下界的最大流
    【BZOJ2006】[NOI2010]超级钢琴 ST表+堆
    【BZOJ4016】[FJOI2014]最短路径树问题 最短路径树+点分治
    【BZOJ2724】[Violet 6]蒲公英 分块+二分
    【BZOJ3697】采药人的路径 点分治
    【BZOJ4026】dC Loves Number Theory 分解质因数+主席树
    【BZOJ3510】首都 LCT维护子树信息+启发式合并
    Python Web学习笔记之socket编程
    Python Web学习笔记之socket套接字
  • 原文地址:https://www.cnblogs.com/tk55/p/6557489.html
Copyright © 2020-2023  润新知