在近期的Hadoop的学习中,在学习mapreduce时遇到问题:让求所给数据的top10,们我们指导mapreduce中是有默认的排列机制的,是按照key的升序从大到小排列的
然而top10问题的求解需要按照降序排列。在网上找了很长时间才得以解决,解决方法如下:
自定义一个比较器,这个比较器要继承WritableComparator类,代码如下:
import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.WritableComparator; public class DescSort extends WritableComparator{ public DescSort() { super(LongWritable.class,true);//注册排序组件 } @Override public int compare(byte[] arg0, int arg1, int arg2, byte[] arg3, int arg4, int arg5) { return -super.compare(arg0, arg1, arg2, arg3, arg4, arg5);//注意使用负号来完成降序 } @Override public int compare(Object a, Object b) { return -super.compare(a, b);//注意使用负号来完成降序 } }
在主函数中要执行时要声明该比较器的类的名称,代码如下:
package Sort; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.RawComparator; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class SortRunner { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://192.168.252.200:9000"); Job job = Job.getInstance(conf); job.setJarByClass(SortRunner.class); job.setSortComparatorClass(DescSort.class); job.setMapperClass(SortMapper.class); job.setReducerClass(SortReducer.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(NullWritable.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(NullWritable.class); //输入输出的路径 FileInputFormat.setInputPaths(job,new Path("/sort/srcdata/")); FileOutputFormat.setOutputPath(job, new Path("/sort/output3")); System.exit(job.waitForCompletion(true)?0:1); } }
注:红色部分便是声明比较器
这样就可以实现降序输出了。
网上与很多按照自定义类类型的排序的输出,在这里便不进行详细介绍,望采纳!!!!