• 自定义MapReduce中数据类型


    数据类型(都实现了Writable接口)

    BooleanWritable		布尔类型
    ByteWritable		单字节数值
    DoubleWritable		双字节数值
    FloatWritable		浮点数
    IntWritable			整型数
    LongWritable		长整型
    Text				UTF-8格式存储的文本
    NullWritable		空类型
    

    因为shuffle中排序依据是key,若定义的数据类型为Key,必须实现writable和comparable接口,即WritableComparable接口

    Writable

    write()把每个对象序列化到输出流				
    readFilds()把输入流字节反序列化到输入流		
    

    自定义数据类型实例

    1.定义私有变量
    2.setter,getter方法
    3.无参有参构造器
    4.set()方法,帮助构造器初始化数据(Hadoop偏爱)
    5.hashCode()方法和equals()方法
    6.toString()方法
    7.implement Writable并实现write()方法readFilds()方法
    8.implement WritableComparable并实现compareTo()方法

    package com.cenzhongman.io;

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    
    import org.apache.hadoop.io.Writable;
    import org.apache.hadoop.io.WritableComparable;
    
    public class UserWritable implements WritableComparable<UserWritable> {
    	private int ip;
    	private String name;
    
    	public UserWritable() {
    	}
    
    	public UserWritable(int ip, String name) {
    		this.set(ip, name);
    	}
    
    	@Override
    	public int hashCode() {
    		final int prime = 31;
    		int result = 1;
    		result = prime * result + ip;
    		result = prime * result + ((name == null) ? 0 : name.hashCode());
    		return result;
    	}
    
    	@Override
    	public String toString() {
    		return ip + "	" + name;
    	}
    
    	@Override
    	public boolean equals(Object obj) {
    		if (this == obj)
    			return true;
    		if (obj == null)
    			return false;
    		if (getClass() != obj.getClass())
    			return false;
    		UserWritable other = (UserWritable) obj;
    		if (ip != other.ip)
    			return false;
    		if (name == null) {
    			if (other.name != null)
    				return false;
    		} else if (!name.equals(other.name))
    			return false;
    		return true;
    	}
    
    	public void set(int ip, String name) {
    		this.setIp(ip);
    		this.setName(name);
    	}
    
    	public int getIp() {
    		return ip;
    	}
    
    	public void setIp(int ip) {
    		this.ip = ip;
    	}
    
    	public String getName() {
    		return name;
    	}
    
    	public void setName(String name) {
    		this.name = name;
    	}
    
    	// read和write方法元素的顺序必须一致
    	@Override
    	public void readFields(DataInput arg0) throws IOException {
    		this.ip = arg0.readInt();
    		this.name = arg0.readUTF();
    	}
    
    	@Override
    	public void write(DataOutput arg0) throws IOException {
    		arg0.writeInt(ip);
    		arg0.writeUTF(name);
    	}
    
    	@Override
    	public int compareTo(UserWritable o) {
    		int comp = Integer.valueOf(this.getIp()).compareTo(o.getIp());
    
    		if (comp != 0) {
    			return comp;
    		}
    		return this.getName().compareTo(o.getName());
    	}
    }
  • 相关阅读:
    VB Script学习
    [杂项笔记] linux下查看so依赖的库
    从文件名中删除下划线
    智联招聘基于 Nebula Graph 的推荐实践分享
    基于 Nebula Graph 构建百亿关系知识图谱实践
    使用 MyBatis 操作 Nebula Graph 的实践
    Nebula Importer 数据导入实践
    leetcode695dfs
    docer redis
    leet1905回溯
  • 原文地址:https://www.cnblogs.com/cenzhongman/p/7133904.html
Copyright © 2020-2023  润新知