自定义Writable、RawComparatorWritable、comparators（转）

自定义Writable、RawComparatorWritable、comparators（转）
自定义Writable

hadoop虽然已经实现了一些非常有用的Writable，而且你可以使用他们的组合做很多事情，但是如果你想构造一些更加复杂的结果，你可以自定义Writable来达到你的目的,我们以注释的方式对自定义Writable进行讲解（不许说我只帖代码占篇幅哦，姿势写在注释里了）：
[java] view plain copy
1. package com.sweetop.styhadoop;
3. import org.apache.hadoop.io.Text;
4. import org.apache.hadoop.io.WritableComparable;
6. import java.io.DataInput;
7. import java.io.DataOutput;
8. import java.io.IOException;
10. /**
11. * Created with IntelliJ IDEA.
12. * User: lastsweetop
13. * Date: 13-7-17
14. * Time: 下午8:50
15. * To change this template use File | Settings | File Templates.
16. */
17. public class EmploeeWritable implements WritableComparable<EmploeeWritable>{
19. private Text name;
20. private Text role;
22. /**
23. * 必须有默认的构造器皿，这样Mapreduce方法才能创建对象，然后通过readFields方法从序列化的数据流中读出进行赋值
24. */
25. public EmploeeWritable() {
26. set(new Text(),new Text());
27. }
29. public EmploeeWritable(Text name, Text role) {
30. set(name,role);
31. }
33. public void set(Text name,Text role) {
34. this.name = name;
35. this.role = role;
36. }
38. public Text getName() {
39. return name;
40. }
42. public Text getRole() {
43. return role;
44. }
46. /**
47. * 通过成员对象本身的write方法，序列化每一个成员对象到输出流中
48. * @param dataOutput
49. * @throws IOException
50. */
51. @Override
52. public void write(DataOutput dataOutput) throws IOException {
53. name.write(dataOutput);
54. role.write(dataOutput);
55. }
57. /**
58. * 同上调用成员对象本身的readFields方法，从输入流中反序列化每一个成员对象
59. * @param dataInput
60. * @throws IOException
61. */
62. @Override
63. public void readFields(DataInput dataInput) throws IOException {
64. name.readFields(dataInput);
65. role.readFields(dataInput);
66. }
68. /**
69. * implements WritableComparable必须要实现的方法,用于比较排序
70. * @param emploeeWritable
71. * @return
72. */
73. @Override
74. public int compareTo(EmploeeWritable emploeeWritable) {
75. int cmp = name.compareTo(emploeeWritable.name);
76. if(cmp!=0){
77. return cmp;
78. }
79. return role.compareTo(emploeeWritable.role);
80. }
82. /**
83. * MapReduce需要一个分割者（Partitioner）把map的输出作为输入分成一块块的喂给多个reduce）
84. * 默认的是HashPatitioner，他是通过对象的hashcode函数进行分割，所以hashCode的好坏决定
85. * 了分割是否均匀，他是一个很关键性的方法。
86. * @return
87. */
88. @Override
89. public int hashCode() {
90. return name.hashCode()*163+role.hashCode();
91. }
93. @Override
94. public boolean equals(Object o) {
95. if(o instanceof EmploeeWritable){
96. EmploeeWritable emploeeWritable=(EmploeeWritable)o;
97. return name.equals(emploeeWritable.name) && role.equals(emploeeWritable.role);
98. }
99. return false;
100. }
102. /**
103. * 如果你想自定义TextOutputformat作为输出格式时的输出，你需要重写toString方法
104. * @return
105. */
106. @Override
107. public String toString() {
108. return name+" "+role;
109. }
110. }
Writable对象是可更改的而且经常被重用，因此尽量避免在write和readFields中分配对象。

自定义RawComparatorWritable

上面的EmploeeWritable已经可以跑的很溜了，但是还是有优化的空间，当作为MapReduce里的key，需要进行比较时，因为他已经被序列化，想要比较他们，那么首先要先反序列化成一个对象，然后再调用compareTo对象进行比较，但是这样效率太低了，有没有可能可以直接比较序列化后的结果呢，答案是肯定的，可以。
我们只需要把EmploeeWritable的序列化后的结果拆成成员对象，然后比较成员对象即可，那么来看代码（讲解再次写在注释里）：
[java] view plain copy

public static class Comparator extends WritableComparator{

        private static final Text.Comparator TEXT_COMPARATOR= new Text.Comparator();



        protected Comparator() {

            super(EmploeeWritable.class);

        }



        @Override

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

            try {

                /**

                 * name是Text类型，Text是标准的UTF-8字节流，

                 * 由一个变长整形开头表示Text中文本所需要的长度，接下来就是文本本身的字节数组

                 * decodeVIntSize返回变长整形的长度，readVInt表示文本字节数组的长度，加起来就是第一个成员name的长度

                 */

                int nameL1= WritableUtils.decodeVIntSize(b1[s1])+readVInt(b1,s1);

                int nameL2=WritableUtils.decodeVIntSize(b2[s2])+readVInt(b2,s2);

                //和compareTo方法一样，先比较name

                int cmp = TEXT_COMPARATOR.compare(b1,s1,nameL1,b2,s2,nameL2);

                if(cmp!=0){

                    return cmp;

                }

                //再比较role

                return TEXT_COMPARATOR.compare(b1,s1+nameL1,l1-nameL1,b2,s2+nameL2,l2-nameL2);

            } catch (IOException e) {

                throw new IllegalArgumentException();

            }

        }



        static {

            //注册raw comprator,更象是绑定，这样MapReduce使用EmploeeWritable时就会直接调用Comparator

            WritableComparator.define(EmploeeWritable.class,new Comparator());

        }

    }

我们没有直接去实现RawComparator而是继承于WritableComparator，因为WritableComparator提供了很多便捷的方法，并且对compare有个默认的实现。写compare方法时一定要小心谨慎，因为都是在字节上操作，可以好好参考下源代码里的一些Writable中Comparator的写法，另外多看下WritableUtils也是由必要的，他里面有很多简便的方法可以使用。
自定义comparators

有时候，除了默认的comparator，你可能还需要一些自定义的comparator来生成不同的排序队列，看一下下面这个示例，只比较name，两个compare是同一意思，都是比较name大小：
[java] view plain copy

public static class NameComparator extends WritableComparator{

        private static final Text.Comparator TEXT_COMPARATOR= new Text.Comparator();



        protected NameComparator() {

            super(EmploeeWritable.class);

        }



        @Override

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

            try {

                int nameL1= WritableUtils.decodeVIntSize(b1[s1])+readVInt(b1,s1);

                int nameL2=WritableUtils.decodeVIntSize(b2[s2])+readVInt(b2,s2);

                return TEXT_COMPARATOR.compare(b1,s1,nameL1,b2,s2,nameL2);

            } catch (IOException e) {

                throw new IllegalArgumentException();

            }

        }



        @Override

        public int compare(WritableComparable a, WritableComparable b) {

            if(a instanceof EmploeeWritable && b instanceof  EmploeeWritable){

                return ((EmploeeWritable)a).name.compareTo(((EmploeeWritable)b).name);

            }

            return super.compare(a,b);

        }

    }
相关阅读:
linux输入yum后提示： -bash: /usr/bin/yum: No such file or directory的解决方案
 MySQL.报错2059处理方法
 抽象工厂模式的优缺点和适用场景
 字节与字符的区别
 Kubernetes诞生及历史
 k8s-设计理念-原理图
 JSF中的状态模式
 关于spring框架JdbcTemplate中的命令模式
 浅谈springMVC中的设计模式(1)——责任链模式
 Spring中的观察者模式
原文地址：https://www.cnblogs.com/xuepei/p/3665470.html

自定义Writable、RawComparatorWritable、comparators（转）

自定义Writable

自定义RawComparatorWritable

自定义comparators