• MapReduce实现手机上网日志分析(排序)


    一、背景

    1.1 流程

      实现排序,分组拍上一篇通过Partitioner实现了。

      实现接口,自动产生接口方法,写属性,产生getter和setter,序列化和反序列化属性,写比较方法,重写toString,为了方便复制写够着方法,不过重写够着方法map里需要不停地new,发现LongWritable有set方法,text也有,可以用,产生默认够着方法。

    	public void set(String account,double income,double expense,double surplus) {
    		this.account = account;
    		this.income = income;
    		this.expense = expense;
    		this.surplus = income-expense;
    	}
    

    1.2 数据集

    为了和上一篇保在知识上持递进,数据及换了,名字没变。

      下面是输出结果,其实mr也会自动排序,不过string按字典序排序了。

    二、理论知识

      字符串拼接,记得以前自己写过,现在拿出来看看,http://www.cnblogs.com/hxsyl/archive/2012/10/18/2729112.html

      简单总结扩展如下:String是final的,不能改变也不能继承,因此在每次对 String 类型进行改变的时候其实都等同于生成了一个新的 String 对象,然后将指针指向新的 String 对象,所以经常改变内容的字符串最好不要用 String ,因为每次生成对象都会对系统性能产生影响,特别当内存中无引用对象多了以后, JVM 的 GC 就会开始工作,那速度是一定会相当慢的。

     

      如果for循环1w次,这句 string += "hello";的过程相当于将原有的string变量指向的对象内容取出与"hello"作字符串相加操作再存进另一个新的String对象当中,再让string变量指向新生成的对象。反编译出的字节码文件可以很清楚地看出,每次循环会new出一个StringBuilder对象,然后进行append操作,最后通过toString方法返回String对象。也就是说这个循环执行完毕new出了10000个对象,试想一下,如果这些对象没有被回收,内存浪费不说,有可能重复使用赵成系统卡死。从上面还可以看出:string+="hello"的操作事实上会自动被JVM优化成:

      StringBuilder str = new StringBuilder(string);

      str.append("hello");

      str.toString();

      如果直接for循环里StringBuilder 的话会只是new一次。效率高。

      而StringBuffer是线程安全的,多了synchronized关键字,也就是在多线程下会顺序读取换冲刺。

     参考了这个http://blog.csdn.net/loveyaozu/article/details/47037957

    三、实体类

      收入相同的话按消费从低到高,否则收入从高到低。

    package cn.app.hadoop.mr.sort;
    
    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.math.BigDecimal;
    
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.jasper.tagplugins.jstl.core.Out;
    
    //Writable是序列化接口
    //泛型是InfoBean,就像比较学生信息一样,成绩,性别等 ,封装在了一个bean里
    //不过发现WritableComparable  有了序列化和反序列化
    public class InfoBean implements WritableComparable<InfoBean>{
    	
    	
    	private String account;
    	//金钱类都需要BigDecimal,double顺势精度,不过不知道下边序列化咋写类型,所以先用double,估计writeUTF可以
    	private double income;
    	private double expense;
    	private double surplus;
    	
    	
    	public String getAccount() {
    		return account;
    	}
    	public void setAccount(String account) {
    		this.account = account;
    	}
    	public double getIncome() {
    		return income;
    	}
    	public void setIncome(double income) {
    		this.income = income;
    	}
    	public double getExpense() {
    		return expense;
    	}
    	public void setExpense(double expense) {
    		this.expense = expense;
    	}
    	public double getSurplus() {
    		return surplus;
    	}
    	public void setSurplus(double surplus) {
    		this.surplus = surplus;
    	}
    	public void readFields(DataInput in) throws IOException {
    		// TODO Auto-generated method stub
    		this.account = in.readUTF();
    		this.income = in.readDouble();
    		this.expense = in.readDouble();
    		this.surplus = in.readDouble();
    	}
    	public void write(DataOutput out) throws IOException {
    		// TODO Auto-generated method stub
    		out.writeUTF(account);
    		out.writeDouble(income);
    		out.writeDouble(expense);
    		out.writeDouble(surplus);
    		
    	}
    	
    	public void set(String account,double income,double expense) {
    		this.account = account;
    		this.income = income;
    		this.expense = expense;
    		this.surplus = income - expense;
    	}
    	
    
    	public InfoBean() {
    		super();
    		// TODO Auto-generated constructor stub
    	}
    	@Override
    	public String toString() {
    		return "InfoBean [income=" + income + ", expense=" + expense
    				+ ", surplus=" + surplus + "]";
    	}
    	public int compareTo(InfoBean o) {
    		// TODO Auto-generated method stub
    		if(this.income == o.getIncome()) {
    			return this.expense>o.getExpense()?1:-1;
    		}else {
    			return this.income>o.getIncome()?-1:1;
    		}
    	}
    }

    四、第一种实现

    4.1 Mapper

    //第一个处理文本的话一般是LongWritable  或者object
    //一行一行的文本是text
    //输出的key的手机号 定位Text
    //结果是DataBean  一定要实现Writable接口
    public class InfoSortMapper extends Mapper<LongWritable, Text, Text, InfoBean> {
    
    	
    	private InfoBean v = new InfoBean();
    	private Text k = new Text();
    	
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account);
    		v.set(account, in, out);
    		
    		context.write(k, v);
    	}
    

      4.2 Reducer

    public class InfoSortReducer extends Reducer<Text, InfoBean, Text, InfoBean> {
    
    	//k就是key,不需要
    	private InfoBean v = new InfoBean();
    	public void reduce(Text key, Iterable<InfoBean> value, Context context)
    			throws IOException, InterruptedException {
    		// process values
    		double incomeSum = 0;
    		double expenseSum = 0;
    		for (InfoBean o : value) {
    			incomeSum += o.getIncome();
    			expenseSum += o.getExpense();
    		}
    		v.set(key.toString(), incomeSum, expenseSum);
    		//databean会自动调用toString
    		context.write(key,v);
    	}
    }
    

    五、第二种实现

    5.1 Mapper

    //对 InfoBean  排序  k2就是他
    public class SortMapper extends Mapper<LongWritable, Text, InfoBean, NullWritable> {
    
    	
    	private InfoBean k = new InfoBean();
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account, in, out);
    		//value必须是NullWritable.get(),NullWritable不行,提示不是变量
    		context.write(k, NullWritable.get());
    	}
    }
    

      5.2 Reducer

    //对 InfoBean  排序  k2就是他
    public class SortMapper extends Mapper<LongWritable, Text, InfoBean, NullWritable> {
    
    	
    	private InfoBean k = new InfoBean();
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account, in, out);
    		//value必须是NullWritable.get(),NullWritable不行,提示不是变量
    		context.write(k, NullWritable.get());
    	}
    }

    六、结束语

      如果k2 v2和k4 v4,也就是mapp的输出和reducer的输出类型不一致的话必须在Main里也设置Mapper的输出,上面的第二种就是。

    job.setMapOutputKeyClass(InfoBean.class);
    		job.setMapOutputValueClass(NullWritable.class);
    		
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(InfoBean.class);
    

      否则java里不报错,加上log4j后看到类型不匹配。

  • 相关阅读:
    CUBRID学习笔记 44 UPDATE 触发器 更新多表 教程
    解决Tomcat出现内存溢出的问题
    用视图+存储过程解决复杂查询的排序分页问题
    IIS的安装与配置
    UI设计
    2 睡觉
    HTML5的新结构标签
    聚合函数
    Sql Group by 语句
    口语第一课
  • 原文地址:https://www.cnblogs.com/hxsyl/p/6165176.html
Copyright © 2020-2023  润新知