• Combiner+GroupingComparator+shuffle原理+Reduce原理


    1、Combiner

     Combiner的输入输出对象必须一样。

    2、GroupingComparator

    运行代码:

    map

    package groupcompartor;
    
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    
    import org.apache.hadoop.mapreduce.Mapper;
    
    
    import java.io.IOException;
    
    public class OrderMapper extends Mapper<LongWritable, Text,OrderBean, NullWritable> {
    
        private OrderBean orderbean=new OrderBean();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] fieds=value.toString().split("	");
            orderbean.setOrderId(fieds[0]);
            orderbean.setProductId(fieds[1]);
            orderbean.setPrice(Double.parseDouble(fieds[2]));
            context.write(orderbean,NullWritable.get());
        }
    }
    

      reduce

    package groupcompartor;
    
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.mapreduce.Reducer;
    
    import java.io.IOException;
    
    public class OrderReducer extends Reducer<OrderBean, NullWritable,OrderBean, NullWritable> {
    
        @Override
        protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            context.write(key,NullWritable.get());
        }
    }
    

      driver

    package groupcompartor;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.logging.log4j.core.config.OrderComparator;
    
    import java.io.IOException;
    
    public class OrderDriver {
        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
            Job job = Job.getInstance(new Configuration());
            job.setJarByClass(OrderDriver.class);
            job. setMapperClass(OrderMapper.class);
            job.setReducerClass(OrderReducer.class);
            job.setMapOutputKeyClass(OrderBean.class);
            job.setMapOutputValueClass(NullWritable.class);
            job.setOutputKeyClass(OrderBean.class);
            job.setGroupingComparatorClass(OderCompartor.class);
            job.setOutputValueClass(NullWritable.class);
            FileInputFormat.setInputPaths(job,new Path("d:\linput"));
            FileOutputFormat.setOutputPath(job,new Path("d:\loutput"));
            boolean b = job.waitForCompletion(true);
            System.exit(b ? 0 : 1);
        }
    }
    

      comparator

    package groupcompartor;
    
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.hadoop.io.WritableComparator;
    
    public class OderCompartor extends WritableComparator {
        protected OderCompartor() {
            super(OrderBean.class,true);
        }
    
        @Override
        public int compare(WritableComparable a, WritableComparable b) {
            OrderBean oa=(OrderBean)a;
            OrderBean ob=(OrderBean)b;
            return oa.getOrderId().compareTo(ob.getOrderId());
        }
    }
    

      

    原本结果:

    预期结果:

    3.3.10 GroupingComparator分组案例实操

    1.需求

    有如下订单数据

    4-2 订单数据

    订单id

    商品id

    成交金额

    0000001

    Pdt_01

    222.8

    Pdt_02

    33.8

    0000002

    Pdt_03

    522.8

    Pdt_04

    122.4

    Pdt_05

    722.4

    0000003

    Pdt_06

    232.8

    Pdt_02

    33.8

    现在需要求出每一个订单中最贵的商品。

    1)输入数据

    2)期望输出数据

    1 222.8

    2 722.4

    3 232.8

    3、shuffle原理

     

  • 相关阅读:
    HDU-6801 2020HDU多校第三场T11 (生成函数)
    [HDU-6791] 2020HDU多校第三场T1(回文自动机)
    回文自动机 (PAM,Palindrome Automaton)
    字符串的Period(周期),Border
    「APIO2019」路灯 (K-D Tree / 树套树 / CDQ + 树状数组)
    「APIO2019」桥梁(询问分块+并查集)
    「APIO2019」奇怪装置
    「APIO2018」选圆圈(K-D Tree/CDQ+Set)
    堆小结
    【[HNOI/AHOI2018]毒瘤】
  • 原文地址:https://www.cnblogs.com/dazhi151/p/13537796.html
Copyright © 2020-2023  润新知