• 5.3.3 自定义writable和RawComparatorWritable


    5.3.3 自定义writable

    (1)构造员工writable

    Hadoop虽然已经实现了一些非常有用的Writable,而且你可以使用他们的组合做很多事情,但是如果你想构造一些更加复杂的结果,你可以自定义Writable来达到你的目的,例如员工writable有姓名和角色两个Text属性构成,需要对员工writable姓名和角色同时进行比较排序。定义类实现WritableComparable接口,、实现构造函数、属性getset函数,readfield和write函数、compareTo函数用于比较、toString()函数实现字符串输出。

    https://blog.csdn.net/lzm1340458776/article/details/42675433

    /**

     * 自定义Writable通常都要实现Writable接口

     * 如果有比较大小的业务,最好是实现WritableComparable接口

    * time : 2015年1月13日下午1:39:12

     * @version

     */

    public class EmployeeWritable implements WritableComparable<EmployeeWritable>{

           //姓名

           private Text name;

           //角色

           private Text role;

          

           //必须提供无参构造方法(一定要创建name和role对象否则会报空指针异常)

           public EmployeeWritable() {

                 

                  name = new Text();

                  role = new Text();

           }

           //构造函数

           public EmployeeWritable(Text name, Text role) {

                  this.name = name;

                  this.role = role;

           }

           public Text getName() {

                  return name;

           }

           public void setName(Text name) {

                  this.name = name;

           }

           public Text getRole() {

                  return role;

           }

           public void setRole(Text role) {

                  this.role = role;

           }

          

           /**

            * 调用成员对象本身的readFields()方法,从输入流中反序列化每一个成员对象

            */

           public void readFields(DataInput dataInput) throws IOException {

                  name.readFields(dataInput);

                  role.readFields(dataInput);

           }

           /**

            * 通过成员对象本身的write方法,序列化每一个成员对象到输出流中

            */

           public void write(DataOutput dataOutput) throws IOException {

                  name.write(dataOutput);

                  role.write(dataOutput);

           }

           /**

            * 如果实现了WritableComparable接口必须实现compareTo方法,用于比较,需要反序列化对象得到text然后比较

            */

           public int compareTo(EmployeeWritable employeeWritable) {

                  int cmp = name.compareTo(employeeWritable.name);

                  //如果不相等

                  if (cmp != 0){

                         return cmp;

                  }

                  //如果名字相等就比较角色

                  return role.compareTo(employeeWritable.role);

          

           /**

            * MapReduce需要一个分割者(Partitioner)把Map的输出作为输入分成一块块的喂给多个reduce

            * 默认的是HashPatitioner,它是通过对象的hashCode函数进行分割。

            * 所以hashCode的好坏决定了分割是否均匀,它是一个很关键的方法

            */

           @Override

           public int hashCode() {

                  final int prime = 31;

                  int result = 1;

                  result = prime * result + ((name == null) ? 0 : name.hashCode());

                  result = prime * result + ((role == null) ? 0 : role.hashCode());

                  return result;

           }

           @Override

           public boolean equals(Object obj) {

                  if (this == obj)

                         return true;

                  if (obj == null)

                         return false;

                  if (getClass() != obj.getClass())

                         return false;

                  EmployeeWritable other = (EmployeeWritable) obj;

                  if (name == null) {

                         if (other.name != null)

                                return false;

                  } else if (!name.equals(other.name))

                         return false;

                  if (role == null) {

                         if (other.role != null)

                                return false;

                  } else if (!role.equals(other.role))

                         return false;

                  return true;

           }

           /**

            * 自定义自己的输出类型

            */

           @Override

           public String toString() {

                  return "EmployeeWritable [姓名=" + name + ", 角色=" + role + "]";

           }

    }

    (2)自定义RawComparatorWritable

    上面的EmployeeWritable, MapReduce里的key,需要进行比较时,首先要反序列化成一个对象,然后再调用compareTo对象进行比较,但是这样效率太低了,可以直接通过序列化之后的数值进行比较,来提高效率直接根据序列化之后的值进行比较排序,我们只需要把EmployeeWritable序列化后的结果拆分为成员对象,然后比较成员对象即可,看代码:

    public static class Comparator extends WritableComparator{

                  private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

                 

                  protected Comparator() {

                         super(EmployeeWritable.class);

                  }

                  //b1是对象a的序列化字节,s1是name的偏移量,l1是总长度

                  @Override

                  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

                        

                         try {

                                // /**

                     * Text是标准的UTF-8字节流,

                     * 由一个变长整形开头表示Text中文本所需要的长度,接下来就是文本本身的字节数组

                     * decodeVIntSize返回变长 整形的长度,readVInt 表示 文本字节数组的长度,加起来就是第一个成员name的长度*/

                                int nameL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);

                                int nameL2 = WritableUtils.decodeVIntSize(b2[2]) + readVInt(b2, s2);

                               

                                //和compareTo方法一样,先比较name

                                int cmp = TEXT_COMPARATOR.compare(b1, s1, nameL1, b2, s2, nameL2);

                               

                                if (cmp != 0){

                                       return cmp;

                                }

                                //再比较role

                                return TEXT_COMPARATOR.compare(b1, s1+nameL1, l1-nameL1, b2, s2+nameL2, l2-nameL2);

                         } catch (Exception e) {

                                throw new IllegalArgumentException();

                         }

                  }

                 

                  static {

                         //注册raw comparator,更像是绑定,这样MapReduce使用EmployeeWritable时就会直接调用Comparator

                         WritableComparator.define(EmployeeWritable.class, new Comparator());

                        

                  }

                 

           }

  • 相关阅读:
    python中修改元组
    c语言中语音警告转义字符
    linux中防火墙策略管理工具firewalld
    C语言获取数值的最后几位数
    VMware安装win7虚拟机
    python中字符串的常规处理
    专家详解面试成功法宝和技巧
    怎样学好java
    一个Java程序员应该掌握的10项技能
    专家详解面试成功法宝和技巧
  • 原文地址:https://www.cnblogs.com/bclshuai/p/11771301.html
Copyright © 2020-2023  润新知