• Hadoop基础【1.1】 Writeable


    MapReduce由于最耗时的是落盘与通信,所以采用了自定的Writeable的序列化反序列化(在结构化对象与二进制流之间的转换以便于节点通信:压缩过,使得节点之间带宽占用较少,可以快速读写),在Mapper Reducer里头常用的比如LongWriteable Text等等。我们根据不同的需求,对其进行一个复杂的定制,主要通过继承Writeable接口来实现。

     源码如下

    首先看下Writeable接口

    package org.apache.hadoop.io;  
    public interface Writable {  
        void write(java.io.DataOutput var1) throws java.io.IOException;  
      
        void readFields(java.io.DataInput var1) throws java.io.IOException;  
    }  

    就俩接口,write和readFields,write用来序列化,readFileds用来反序列化

    看下LongWriteable怎么实现的

     1 import java.io.DataInput;
     2 import java.io.DataOutput;
     3 import java.io.IOException;
     4 
     5 public class LongWritable implements WritableComparable<LongWritable> {
     6     private long value;
     7 
     8     public LongWritable() {
     9     }
    10 
    11     public LongWritable(long value) {
    12         this.set(value);
    13     }
    14 
    15     public void set(long value) {
    16         this.value = value;
    17     }
    18 
    19     public long get() {
    20         return this.value;
    21     }
    22 
    23     public void readFields(DataInput in) throws IOException {
    24         this.value = in.readLong();
    25     }
    26 
    27     public void write(DataOutput out) throws IOException {
    28         out.writeLong(this.value);
    29     }
    30 
    31     public boolean equals(Object o) {
    32         if (!(o instanceof LongWritable)) {
    33             return false;
    34         } else {
    35             LongWritable other = (LongWritable)o;
    36             return this.value == other.value;
    37         }
    38     }
    39 
    40     public int hashCode() {
    41         return (int)this.value;
    42     }
    43 
    44     public int compareTo(LongWritable o) {
    45         long thisValue = this.value;
    46         long thatValue = o.value;
    47         return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
    48     }
    49 
    50     public String toString() {
    51         return Long.toString(this.value);
    52     }
    53 
    54     static {
    55         WritableComparator.define(LongWritable.class, new LongWritable.Comparator());
    56     }
    57 
    58     public static class DecreasingComparator extends LongWritable.Comparator {
    59         public DecreasingComparator() {
    60         }
    61 
    62         public int compare(WritableComparable a, WritableComparable b) {
    63             return super.compare(b, a);
    64         }
    65 
    66         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    67             return super.compare(b2, s2, l2, b1, s1, l1);
    68         }
    69     }
    70 
    71     public static class Comparator extends WritableComparator {
    72         public Comparator() {
    73             super(LongWritable.class);
    74         }
    75 
    76         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    77             long thisValue = readLong(b1, s1);
    78             long thatValue = readLong(b2, s2);
    79             return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
    80         }
    81     }
    82 }

    WriteableComparable如下

    WritableComparable是Hadoop的排序方式之一,而排序是MapReduce框架中最重要的操作之一,它就是用来给数据排序的(按照Key排好),常发生在MapTask与ReduceTask的传输过程中(就是数据从map方法写到reduce方法之间,shuffle呗?)

    public interface WritableComparable<T> extends Writable, Comparable<T> {
    }

    21为止getter setter加简单的构造函数,50-52为toString,23-29实现Writable里的两个方法(DataOutput.writeLong&DataInput.readLong),44-48为Comparable的compareTo,然后Object为LongWriteable且value同则equals返回true,hashcode方法返回value

    对于简单的仅在Map的输出和Reduce的输入这儿用的的地方来说,一般compareTo,toString,write,readFields写完就ok了

    然后再往下看?Comparator?啥玩意?

    WritableComparator(54-81行)

     WritableComparator类大致类似于一个注册表,里面记录了所有Comparator类的集合。Comparators成员用一张Hash表记录Key=Class,value=WritableComprator的注册信息。(PS:工厂模式)

    它继承了RawComparator,RawComparator是用来实现直接比较数据流中的记录,无需先把数据流序列化为对象,这样便避免了新建对象的额外开销。

    因此54-56为static块把LongWriteable“注册了”,71-80就是LongWriteable在static块里头要注册的Comparator(我大1,我小-1,我相等就0)( API这么写的 This base implemenation uses the natural ordering. To define alternate orderings)看起来不大清楚是干嘛的。。。

    尝试了下在wordcount里头,把Reduce的output变成自己定义的,没写Comparator的StupidIntWritable,但是也能正常输出。。。我这就迷惑了。。。再想想把。。。

  • 相关阅读:
    qt一些函数
    js时间字符串转时间戳
    golang学习之interface与其它类型转换
    golang学习之奇葩的time format
    windows下安装mongodb
    golang学习之struct
    golang学习之闭包
    js生成6位随机码
    golang学习之生成代码文档
    moment常用操作
  • 原文地址:https://www.cnblogs.com/tillnight1996/p/12317072.html
Copyright © 2020-2023  润新知