• 《Hadoop实战》之链接多个MapReduce作业


    顺序链接MapReduce作业

    形如:mapreduce-1 | mapreduce-2 | mapreduce-3

    • 在run函数中,继续写新的job,再通过JobClient.runJob()进行调用
    @Override
    public int run(String[] args) throws Exception {
    	JobConf job1 = new JobConf(getConf(), getClass());
    	JobClient.runJob(job1);
    	
    	JobConf job2 = new JobConf(getConf(), getClass());
    	JobClient.runJob(job2);
    }
    

    具有复杂依赖的MapReduce链接

    • 通过Job和JobControl类来管理
    // 对于Job对象x和y
    x.addDependingJob(y)	// 添加依赖关系:在y完成之前,x不会启动
    
    jobControl.addJob(x)	// Job对象x,y 由JobControl对象管理
    jobControl.addJob(y)	
    
    
    jobControl.allFinished()	//JobControl对象的监视方法
    jobControl.getFailedJobs()
    

    预处理和后处理的链接

    形如:Map+ | REDUCE | MAP*

    • ChainMapper/ChainReducer:减少输出的中间结果

    • addMapper/setReducer接口

      • job、mapperConf:全局和本地JobConf对象
      • kclass:Mapper类
      • 输入输出类的类型
      • byValue:MapOutputKey跟MapOutputValue是否采用值传递的方式
        • true:值传递
        • false:引用传递
    public static <K1, V1, K2, V2> void 
    						addMapper(JobConf job,
    								  Class<? extends Mapper<K1, V1, K2, V2>> kclass,
    								  Class<? extends K1> inputKeyClass,
    								  Class<? extends V1> inputValueClass,
    								  Class<? extends K2> outputKeyClass,
    								  Class<? extends V2> outputValueClass,
    								  boolean byValue,
    								  JobConf mapperConf)
    
    例:具有预处理和后处理的MapReduce Driver
    • Map1 | Map2 | Reduce | Map3 | Map4
      • ChainMapper.addMapper:添加Reduce前所有步骤
      • ChainReducer.addMapper:后续步骤
      • 本地JobConf对象具有更高优先级
        @Override
        public int run(String[] args) throws Exception {
            JobConf job = new JobConf(getConf(), getClass());
    
            job.setJobName("ChainJob");
            job.setInputFormat(TextInputFormat.class);
            job.setOutputFormat(TextOutputFormat.class);
    
            JobConf map1Conf = new JobConf(false);  // loadDefaults=false,生成本地配置对象
            ChainMapper.addMapper(job, Map1.class, LongWritable.class, Text.class,
                    Text.class, Text.class, true, map1Conf);
            JobConf map2Conf = new JobConf(false);
            ChainMapper.addMapper(job, Map2.class, Text.class, Text.class,
                    LongWritable.class, Text.class, true, map2Conf);
    
            JobConf reduceConf = new JobConf(false);    
            ChainReducer.setReducer(job, ReducerClass.class, LongWritable.class, Text.class,
                    Text.class, Text.class, true, reduceConf);
    
            JobConf map3Conf = new JobConf(false);
            ChainReducer.addMapper(job, Map3.class, Text.class, Text.class,
                    LongWritable.class, Text.class, true, map3Conf);
            JobConf map4Conf = new JobConf(false);
            ChainReducer.addMapper(job, Map4.class, LongWritable.class, Text.class,
                    LongWritable.class, Text.class, true, map4Conf);
            
            JobClient.runJob(job);
            return 0;
        }
    
  • 相关阅读:
    【Docker】Dockerfile的基本使用
    Linux nsenter 命令简介及 切换宿主机网络为docker容器网络实践
    docker+selenium搭建分布式web自动化测试环境
    docker容器与虚拟机的区别
    docker 网络模式 和 端口映射
    Java基础之数组的定义与使用详解
    Java基础之数据类型、标识符、运算符、程序流程控制结构
    docker常用命令详解
    python查询腾讯云COS存储桶目录及文件大小
    python-自动化监控进程发钉钉报警
  • 原文地址:https://www.cnblogs.com/vvlj/p/14101858.html
Copyright © 2020-2023  润新知