• MapReduce编程系列 — 4:排序


    1、项目名称:

    2、程序代码:

    package com.sort;
    
    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    
    public class Sort {
        //map将输入中的value化成IntWritable类型,作为输出的key
        public static class Map extends Mapper<Object, Text , IntWritable, IntWritable>{
            public static IntWritable data = new IntWritable();
    
            public void map(Object key , Text value, Context context) throws IOException,InterruptedException{
                System.out.println("Mapper.................");
                System.out.println("key:"+key+"  value:"+value);
    
                String line = value.toString();
                data.set(Integer.parseInt(line));
                context.write(data, new IntWritable(1));
                System.out.println("data:"+data+" context:"+context);
            }
        }
    
        //reduce将输入的key复制到输出的value上,然后根据输入的value-list中元素的个数决定key的输出次数
        //用全局linenum来代表key的位次
        public static class Reduce extends Reducer<IntWritable , IntWritable, IntWritable, IntWritable >{
            public static IntWritable linenum = new IntWritable(1);
    
            public void reduce(IntWritable key, Iterable<IntWritable> values , Context context)throws IOException,InterruptedException{
                System.out.println("Reducer.................");
                System.out.println("key:"+key+"  value:"+values);
    
                for(IntWritable val : values){
                    context.write(linenum, key);
                    System.out.println("linenum:" + linenum +"  key:"+key+" context:"+context);
                    linenum = new IntWritable(linenum.get()+1);
    
                }
            }
        }
        public static void main(String [] args) throws Exception{
            Configuration conf = new Configuration();
            String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
            if(otherArgs.length != 2){
                System.out.println("Usage: sort<in><out>");
                System.exit(2);
            }
            Job job = new Job(conf,"sort");
            job.setJarByClass(Sort.class);
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
    
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(IntWritable.class);
    
            FileInputFormat.addInputPath(job, new Path(otherArgs[0]));        
            FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));
    
            System.exit(job.waitForCompletion(true)? 0 : 1);
        }
    }
    3、测试数据:
    file1:
    2
    32
    654
    32
    15
    756
    65223
     
    file2:
    5956
    22
    650
    92
     
    file3:
    26
    54
    6
     
    4、运行过程:
    14/09/21 17:44:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/09/21 17:44:27 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    14/09/21 17:44:28 INFO input.FileInputFormat: Total input paths to process : 3
    14/09/21 17:44:28 WARN snappy.LoadSnappy: Snappy native library not loaded
    14/09/21 17:44:28 INFO mapred.JobClient: Running job: job_local_0001
    14/09/21 17:44:28 INFO util.ProcessTree: setsid exited with exit code 0
    14/09/21 17:44:28 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@365f3cec
    14/09/21 17:44:28 INFO mapred.MapTask: io.sort.mb = 100
    14/09/21 17:44:28 INFO mapred.MapTask: data buffer = 79691776/99614720
    14/09/21 17:44:28 INFO mapred.MapTask: record buffer = 262144/327680
    Mapper.................
    key:0  value:2
    data:2 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:2  value:32
    data:32 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:5  value:654
    data:654 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:9  value:32
    data:32 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:12  value:15
    data:15 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:15  value:756
    data:756 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    Mapper.................
    key:19  value:65223
    data:65223 context:org.apache.hadoop.mapreduce.Mapper$Context@40804be
    14/09/21 17:44:28 INFO mapred.MapTask: Starting flush of map output
    14/09/21 17:44:28 INFO mapred.MapTask: Finished spill 0
    14/09/21 17:44:28 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    14/09/21 17:44:29 INFO mapred.JobClient:  map 0% reduce 0%
    14/09/21 17:44:31 INFO mapred.LocalJobRunner:
    14/09/21 17:44:31 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
    14/09/21 17:44:31 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5c72877c
    14/09/21 17:44:31 INFO mapred.MapTask: io.sort.mb = 100
    14/09/21 17:44:31 INFO mapred.MapTask: data buffer = 79691776/99614720
    14/09/21 17:44:31 INFO mapred.MapTask: record buffer = 262144/327680
    Mapper.................
    key:0  value:5956
    data:5956 context:org.apache.hadoop.mapreduce.Mapper$Context@5c0134fb
    Mapper.................
    key:5  value:22
    data:22 context:org.apache.hadoop.mapreduce.Mapper$Context@5c0134fb
    Mapper.................
    key:8  value:650
    data:650 context:org.apache.hadoop.mapreduce.Mapper$Context@5c0134fb
    Mapper.................
    key:12  value:92
    data:92 context:org.apache.hadoop.mapreduce.Mapper$Context@5c0134fb
    14/09/21 17:44:31 INFO mapred.MapTask: Starting flush of map output
    14/09/21 17:44:31 INFO mapred.MapTask: Finished spill 0
    14/09/21 17:44:31 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
    14/09/21 17:44:32 INFO mapred.JobClient:  map 100% reduce 0%
    14/09/21 17:44:34 INFO mapred.LocalJobRunner:
    14/09/21 17:44:34 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
    14/09/21 17:44:34 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5c88c5d3
    14/09/21 17:44:34 INFO mapred.MapTask: io.sort.mb = 100
    14/09/21 17:44:34 INFO mapred.MapTask: data buffer = 79691776/99614720
    14/09/21 17:44:34 INFO mapred.MapTask: record buffer = 262144/327680
    Mapper.................
    key:0  value:26
    data:26 context:org.apache.hadoop.mapreduce.Mapper$Context@36a05d78
    Mapper.................
    key:3  value:54
    data:54 context:org.apache.hadoop.mapreduce.Mapper$Context@36a05d78
    Mapper.................
    key:6  value:6
    data:6 context:org.apache.hadoop.mapreduce.Mapper$Context@36a05d78
    14/09/21 17:44:34 INFO mapred.MapTask: Starting flush of map output
    14/09/21 17:44:34 INFO mapred.MapTask: Finished spill 0
    14/09/21 17:44:34 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
    14/09/21 17:44:37 INFO mapred.LocalJobRunner:
    14/09/21 17:44:37 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done.
    14/09/21 17:44:37 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3c521e5d
    14/09/21 17:44:37 INFO mapred.LocalJobRunner:
    14/09/21 17:44:37 INFO mapred.Merger: Merging 3 sorted segments
    14/09/21 17:44:37 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 146 bytes
    14/09/21 17:44:37 INFO mapred.LocalJobRunner:
    Reducer.................
    key:2  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:1  key:2 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:6  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:2  key:6 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:15  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:3  key:15 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:22  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:4  key:22 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:26  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:5  key:26 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:32  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:6  key:32 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    linenum:7  key:32 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:54  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:8  key:54 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:92  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:9  key:92 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:650  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:10  key:650 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:654  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:11  key:654 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:756  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:12  key:756 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:5956  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:13  key:5956 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    Reducer.................
    key:65223  value:org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@38839cf7
    linenum:14  key:65223 context:org.apache.hadoop.mapreduce.Reducer$Context@23475bbf
    14/09/21 17:44:37 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
    14/09/21 17:44:37 INFO mapred.LocalJobRunner:
    14/09/21 17:44:37 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
    14/09/21 17:44:37 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/user/hadoop/sort_output
    14/09/21 17:44:40 INFO mapred.LocalJobRunner: reduce > reduce
    14/09/21 17:44:40 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
    14/09/21 17:44:41 INFO mapred.JobClient:  map 100% reduce 100%
    14/09/21 17:44:41 INFO mapred.JobClient: Job complete: job_local_0001
    14/09/21 17:44:41 INFO mapred.JobClient: Counters: 22
    14/09/21 17:44:41 INFO mapred.JobClient:   Map-Reduce Framework
    14/09/21 17:44:41 INFO mapred.JobClient:     Spilled Records=28
    14/09/21 17:44:41 INFO mapred.JobClient:     Map output materialized bytes=158
    14/09/21 17:44:41 INFO mapred.JobClient:     Reduce input records=14
    14/09/21 17:44:41 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
    14/09/21 17:44:41 INFO mapred.JobClient:     Map input records=14
    14/09/21 17:44:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=345
    14/09/21 17:44:41 INFO mapred.JobClient:     Map output bytes=112
    14/09/21 17:44:41 INFO mapred.JobClient:     Reduce shuffle bytes=0
    14/09/21 17:44:41 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
    14/09/21 17:44:41 INFO mapred.JobClient:     Reduce input groups=13
    14/09/21 17:44:41 INFO mapred.JobClient:     Combine output records=0
    14/09/21 17:44:41 INFO mapred.JobClient:     Reduce output records=14
    14/09/21 17:44:41 INFO mapred.JobClient:     Map output records=14
    14/09/21 17:44:41 INFO mapred.JobClient:     Combine input records=0
    14/09/21 17:44:41 INFO mapred.JobClient:     CPU time spent (ms)=0
    14/09/21 17:44:41 INFO mapred.JobClient:     Total committed heap usage (bytes)=1325400064
    14/09/21 17:44:41 INFO mapred.JobClient:   File Input Format Counters
    14/09/21 17:44:41 INFO mapred.JobClient:     Bytes Read=48
    14/09/21 17:44:41 INFO mapred.JobClient:   FileSystemCounters
    14/09/21 17:44:41 INFO mapred.JobClient:     HDFS_BYTES_READ=161
    14/09/21 17:44:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=162878
    14/09/21 17:44:41 INFO mapred.JobClient:     FILE_BYTES_READ=3682
    14/09/21 17:44:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=81
    14/09/21 17:44:41 INFO mapred.JobClient:   File Output Format Counters
    14/09/21 17:44:41 INFO mapred.JobClient:     Bytes Written=81
     
    5、运行结果:
    1    2
    2    6
    3    15
    4    22
    5    26
    6    32
    7    32
    8    54
    9    92
    10    650
    11    654
    12    756
    13    5956
    14    65223
  • 相关阅读:
    八皇后问题--------------------递归回溯
    排序算法06------------------------插入排序
    无重复字符最长子串----------------滑动窗口法
    排序算法05------------------------堆排序(图解)
    排序算法04------------------------归并排序
    图形学基础(一)光栅图形学_下:剪裁
    计网Top-Down 抄书笔记(二)——应用层
    图形学基础(二)图形变换_下:3D 平行投影
    图形学基础(二)图形变换_上:2D 基本变换/复合变换
    图形学基础(一)光栅图形学_上:画直线/圆、区域填充
  • 原文地址:https://www.cnblogs.com/yangyquin/p/5021175.html
Copyright © 2020-2023  润新知