• hadoop-序列化


    1.序列化概述

      1.1 什么是序列化

        序列化就是把内存中的对象,转换成字节序列(或其他数据传输协议)以便于存储到磁盘(持久化)和网络传输;

        反序列化就是将收到字节序列(或其他数据传输协议)或者是磁盘的持久化数据,转换成内存中的对象;

      1.2 为什么要序列化

        一般来说,“活的”对象只生存在内存中,关机断电就没有了;而且“活的”对象只能由本地的进程使用,不能发送到网络上的另外一台计算机;然而序列化可以存储“活的”对象,可以将“活的”对象发送到远程计算机;

      1.3 为甚不用java的序列化

        java的序列化是一个重量级序列化框架(Serializable),一个对象被序列化后,会附带很多额外的信息(各种效验信息,Header,继承体系等),不便于在网络中高效传输,所以,hadoop自己开发了一套序列化机制(Writable);

      1.4 hadoop序列化特点

        1.4.1 紧凑:高效使用存储空间;

        1.4.2 快速:读写数据的额外开销小;

        1.4.3 可扩展:随着通信协议的升级而可升级;

        1.4.4 互操作:支持多语言的交互;

    2.自定义bean对象实现序列接口(Writable)

      在企业开发中往往常用的基本序列化类型不能满足所有需求,比如在hadoop框架内部传递一个bean对象,那么该对象就需要实现序列化接口;

      2.1 必须实现Writable接口;

      2.2 反序列化,需要反射调用空参构造函数,所以必须有空参构造;

    public FlowBean() {
         super();      
    }

      2.3 重写序列化方法

        /*序列化方法
        * dataOutput 框架给我们提供的数据出口
        * */
        @Override
        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeLong(upFlow);
            dataOutput.writeLong(downFlow);
            dataOutput.writeLong(sumFlow);
        }

      2.4 重写反序列化方法

     /*反序列化方法
        * dataInput 框架提供的数据来源
        * */
        @Override
        public void readFields(DataInput dataInput) throws IOException {
            upFlow=dataInput.readLong();
            downFlow=dataInput.readLong();
            sumFlow=dataInput.readLong();
        }

    3.案例

      3.1 编写FlowwBean

    package com.wn.flow;
    
    import org.apache.hadoop.io.Writable;
    
    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    
    public class FlowwBean implements Writable {
        private long upFlow;
        private long downFlow;
        private long sumFlow;
    
        public FlowwBean() {
        }
    
        @Override
        public String toString() {
            return "FlowwBean{" +
                    "upFlow=" + upFlow +
                    ", downFlow=" + downFlow +
                    ", sumFlow=" + sumFlow +
                    '}';
        }
    
        public void set(long upFlow, long downFlow){
            this.upFlow=upFlow;
            this.downFlow=downFlow;
            this.sumFlow=upFlow+downFlow;
        }
    
        public long getDownFlow() {
            return downFlow;
        }
    
        public void setDownFlow(long downFlow) {
            this.downFlow = downFlow;
        }
    
        public long getSumFlow() {
            return sumFlow;
        }
    
        public void setSumFlow(long sumFlow) {
            this.sumFlow = sumFlow;
        }
    
        public long getUpFlow() {
            return upFlow;
        }
    
        public void setUpFlow(long upFlow) {
            this.upFlow = upFlow;
        }
    
        /*序列化方法
        * dataOutput 框架给我们提供的数据出口
        * */
        @Override
        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeLong(upFlow);
            dataOutput.writeLong(downFlow);
            dataOutput.writeLong(sumFlow);
        }
    
        /*顺序要完全一致*/
    
        /*反序列化方法
        * dataInput 框架提供的数据来源
        * */
        @Override
        public void readFields(DataInput dataInput) throws IOException {
            upFlow=dataInput.readLong();
            downFlow=dataInput.readLong();
            sumFlow=dataInput.readLong();
        }
    }

      3.2 编写FlowMapper

    package com.wn.flow;
    
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    import java.io.IOException;
    
    public class FlowMapper extends Mapper<LongWritable, Text,Text,FlowwBean> {
    
        private Text phone=new Text();
        private FlowwBean flow=new FlowwBean();
    
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split("	");
            phone.set(split[1]);
            long upFlow = Long.parseLong(split[split.length - 3]);
            long downFlow = Long.parseLong(split[split.length - 2]);
            flow.set(upFlow,downFlow);
            context.write(phone,flow);
        }
    }

      3.3 编写FlowReducer

    package com.wn.flow;
    
    import org.apache.hadoop.mapreduce.Reducer;
    
    import javax.xml.soap.Text;
    import java.io.IOException;
    
    public class FlowReducer extends Reducer<Text,FlowwBean,Text,FlowwBean> {
    
        private FlowwBean sumFlow=new FlowwBean();
    
        @Override
        protected void reduce(Text key, Iterable<FlowwBean> values, Context context) throws IOException, InterruptedException {
            long sumUpFlow=0;
            long sumDownFlow=0;
            for (FlowwBean value:values){
                sumUpFlow+=value.getUpFlow();
                sumDownFlow+=value.getDownFlow();
            }
            sumFlow.set(sumUpFlow,sumDownFlow);
            context.write(key,sumFlow);
        }
    }

      3.4 编写FlowDriver

    package com.wn.flow;
    
    import com.wn.wordcount.WcDriver;
    import com.wn.wordcount.WcMapper;
    import com.wn.wordcount.WcReducer;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    import java.io.IOException;
    
    public class FlowDriver {
        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
            //获取一个Job实例
            Job job = Job.getInstance(new Configuration());
    
            //设置类路径
            job.setJarByClass(FlowDriver.class);
    
            //设置mapper和reducer
            job.setMapperClass(FlowMapper.class);
            job.setReducerClass(FlowReducer.class);
    
            //设置mapper和reducer输出类型
            job.setMapOutputKeyClass(org.apache.hadoop.io.Text.class);
            job.setMapOutputValueClass(FlowwBean.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(FlowwBean.class);
    
            //设置输入的数据
            FileInputFormat.setInputPaths(job,new Path(args[0]));
            FileOutputFormat.setOutputPath(job,new Path(args[1]));
    
            //提交job
           boolean b = job.waitForCompletion(true);
            System.exit(b?0:1);
        }
    
    }

     

  • 相关阅读:
    python 单体模式 的几种实现
    python 相对路径导入 与 绝对路径导入
    python 优雅地实现插件架构
    tkinter 弹出窗口 传值回到 主窗口
    flask 与 vue.js 2.0 实现 todo list
    FormData 数据转化为 json 数据
    vue.js 2.0实现的简单分页
    一个神奇的实现:计算数组尾部对称长度
    flask, SQLAlchemy, sqlite3 实现 RESTful API 的 todo list, 同时支持form操作
    SQLAlchemy 关联表删除实验
  • 原文地址:https://www.cnblogs.com/wnwn/p/12600161.html
Copyright © 2020-2023  润新知