MapReduce由于最耗时的是落盘与通信,所以采用了自定的Writeable的序列化反序列化(在结构化对象与二进制流之间的转换以便于节点通信:压缩过,使得节点之间带宽占用较少,可以快速读写),在Mapper Reducer里头常用的比如LongWriteable Text等等。我们根据不同的需求,对其进行一个复杂的定制,主要通过继承Writeable接口来实现。
源码如下
首先看下Writeable接口
package org.apache.hadoop.io; public interface Writable { void write(java.io.DataOutput var1) throws java.io.IOException; void readFields(java.io.DataInput var1) throws java.io.IOException; }
就俩接口,write和readFields,write用来序列化,readFileds用来反序列化
看下LongWriteable怎么实现的
1 import java.io.DataInput; 2 import java.io.DataOutput; 3 import java.io.IOException; 4 5 public class LongWritable implements WritableComparable<LongWritable> { 6 private long value; 7 8 public LongWritable() { 9 } 10 11 public LongWritable(long value) { 12 this.set(value); 13 } 14 15 public void set(long value) { 16 this.value = value; 17 } 18 19 public long get() { 20 return this.value; 21 } 22 23 public void readFields(DataInput in) throws IOException { 24 this.value = in.readLong(); 25 } 26 27 public void write(DataOutput out) throws IOException { 28 out.writeLong(this.value); 29 } 30 31 public boolean equals(Object o) { 32 if (!(o instanceof LongWritable)) { 33 return false; 34 } else { 35 LongWritable other = (LongWritable)o; 36 return this.value == other.value; 37 } 38 } 39 40 public int hashCode() { 41 return (int)this.value; 42 } 43 44 public int compareTo(LongWritable o) { 45 long thisValue = this.value; 46 long thatValue = o.value; 47 return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1); 48 } 49 50 public String toString() { 51 return Long.toString(this.value); 52 } 53 54 static { 55 WritableComparator.define(LongWritable.class, new LongWritable.Comparator()); 56 } 57 58 public static class DecreasingComparator extends LongWritable.Comparator { 59 public DecreasingComparator() { 60 } 61 62 public int compare(WritableComparable a, WritableComparable b) { 63 return super.compare(b, a); 64 } 65 66 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { 67 return super.compare(b2, s2, l2, b1, s1, l1); 68 } 69 } 70 71 public static class Comparator extends WritableComparator { 72 public Comparator() { 73 super(LongWritable.class); 74 } 75 76 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { 77 long thisValue = readLong(b1, s1); 78 long thatValue = readLong(b2, s2); 79 return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1); 80 } 81 } 82 }
WriteableComparable如下
WritableComparable是Hadoop的排序方式之一,而排序是MapReduce框架中最重要的操作之一,它就是用来给数据排序的(按照Key排好),常发生在MapTask与ReduceTask的传输过程中(就是数据从map方法写到reduce方法之间,shuffle呗?)
public interface WritableComparable<T> extends Writable, Comparable<T> { }
21为止getter setter加简单的构造函数,50-52为toString,23-29实现Writable里的两个方法(DataOutput.writeLong&DataInput.readLong),44-48为Comparable的compareTo,然后Object为LongWriteable且value同则equals返回true,hashcode方法返回value
对于简单的仅在Map的输出和Reduce的输入这儿用的的地方来说,一般compareTo,toString,write,readFields写完就ok了
然后再往下看?Comparator?啥玩意?
WritableComparator(54-81行)
WritableComparator类大致类似于一个注册表,里面记录了所有Comparator类的集合。Comparators成员用一张Hash表记录Key=Class,value=WritableComprator的注册信息。(PS:工厂模式)
它继承了RawComparator,RawComparator是用来实现直接比较数据流中的记录,无需先把数据流序列化为对象,这样便避免了新建对象的额外开销。
因此54-56为static块把LongWriteable“注册了”,71-80就是LongWriteable在static块里头要注册的Comparator(我大1,我小-1,我相等就0)( API这么写的 This base implemenation uses the natural ordering. To define alternate orderings)看起来不大清楚是干嘛的。。。
尝试了下在wordcount里头,把Reduce的output变成自己定义的,没写Comparator的StupidIntWritable,但是也能正常输出。。。我这就迷惑了。。。再想想把。。。