MapReduce:实现文档倒序排序，且字符串拼接+年+月+日

写出MapReduce程序完成以下功能.

input1：

2012-3-1 a
2012-3-2 b
2012-3-3 c
2012-3-4 d
2012-3-5 a
2012-3-6 b
2012-3-7 c
2012-3-3 c

input2：

2012-3-1 b
2012-3-2 a
2012-3-3 b
2012-3-4 d
2012-3-5 a
2012-3-6 c
2012-3-7 d
2012-3-3 c

目标操作实现结果：

2012年3月3日 c
2012年3月7日 c
2012年3月6日 b
2012年3月5日 a
2012年3月4日 d
2012年3月3日 c
2012年3月2日 b
2012年3月1日 a
2012年3月3日 c
2012年3月7日 d
2012年3月6日 c
2012年3月5日 a
2012年3月4日 d
2012年3月3日 b
2012年3月2日 a
2012年3月1日 b

代码如下（由于水平有限，不保证完全正确，如果发现错误欢迎指正）：

package one;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class TestYear {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration config = new  Configuration();
        config.set("fs.defaultFS", "hdfs://192.168.0.100:9000");
        config.set("yarn.resourcemanager.hostname", "192.168.0.100");
        
        FileSystem fs = FileSystem.get(config);
        
        Job job = Job.getInstance(config);
        
        job.setJarByClass(TestYear.class);
        
        //设置所用到的map类
        job.setMapperClass(myMapper.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);
        
        //设置用到的reducer类
        job.setReducerClass(myReducer.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);
        
        //设置输出地址
        FileInputFormat.addInputPath(job, new Path("/zhoukao3/"));
        
        Path path = new Path("/output1/");
        
        if(fs.exists(path)){
            fs.delete(path, true);
        }
        
        //指定文件的输出地址
        FileOutputFormat.setOutputPath(job, path);
        
        //启动处理任务job
        boolean completion = job.waitForCompletion(true);
        if(completion){
            System.out.println("Job Success!");
        }
    }
    
    public static class myMapper extends Mapper<LongWritable, Text, NullWritable , Text>{

        @Override
        protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {
            String values=value.toString();
            String words[]=values.split("[-]| ");//2012,3,1,a
            String s=words[0]+"年"+words[1]+"月"+words[2]+"日"+"  "+words[3];
            context.write(NullWritable.get(),new Text(s));
        }
    }
    
    public static class myReducer extends Reducer<NullWritable , Text,NullWritable , Text>{

        @Override
        protected void reduce(NullWritable key, Iterable<Text> values,Context context)throws IOException, InterruptedException {
            for (Text value  : values) {
                context.write(key, value);
            }
        }
        
    }
}

小结：把value-list作为map的value输出，这样就不会排序和去重，然后reduce再去接收并且context.write()出来，需要注意的是-号是特殊字符，需要做分割处理，所以可以加上\或者[ ]，注意点这些小细节就能完成最终的效果。

如果您认为这篇文章还不错或者有所收获，您可以通过右边的“打赏”功能 打赏我一杯咖啡【物质支持】，也可以点击下方的【好文要顶】按钮【精神支持】，因为这两种支持都是使我继续写作、分享的最大动力！

相关阅读:
国际组织
 波段
 hhgis驱动
 百度地图格式
 气象数据格式
 汽车用传感器
 无线传感器网络
 【系统软件工程师面试】7. 消息队列
 【ToDo】存储设计概述
 Arthas: Java 动态追踪技术
原文地址：https://www.cnblogs.com/supiaopiao/p/7239327.html