• MapReduce编程:平均成绩


     问题描述

    现在有三个文件分别代表学生的各科成绩,编程求各位同学的平均成绩。

                       

    编程思想

    map函数将姓名作为key,成绩作为value输出,reduce根据key即可将三门成绩相加。

    代码

     1 package org.apache.hadoop.examples;
     2 
     3 import java.io.IOException;
     4 import java.util.Iterator;
     5 import java.util.StringTokenizer;
     6 import org.apache.hadoop.conf.Configuration;
     7 import org.apache.hadoop.fs.Path;
     8 import org.apache.hadoop.io.IntWritable;
     9 import org.apache.hadoop.io.Text;
    10 import org.apache.hadoop.mapreduce.Job;
    11 import org.apache.hadoop.mapreduce.Mapper;
    12 import org.apache.hadoop.mapreduce.Reducer;
    13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    14 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    15  
    16 public class calcGPA {
    17     public calcGPA() {
    18     }
    19  
    20     public static void main(String[] args) throws Exception {
    21         Configuration conf = new Configuration();
    22         
    23         String fileAddress = "hdfs://localhost:9000/user/hadoop/";
    24         
    25         //String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
    26         String[] otherArgs = new String[]{fileAddress+"score1.txt", fileAddress+"score2.txt", fileAddress+"score3.txt", fileAddress+"output"};
    27         if(otherArgs.length < 2) {
    28             System.err.println("Usage: calcGPA <in> [<in>...] <out>");
    29             System.exit(2);
    30         }
    31  
    32         Job job = Job.getInstance(conf, "calc GPA");
    33         job.setJarByClass(calcGPA.class);
    34         job.setMapperClass(calcGPA.TokenizerMapper.class);
    35         job.setCombinerClass(calcGPA.IntSumReducer.class);
    36         job.setReducerClass(calcGPA.IntSumReducer.class);
    37         job.setOutputKeyClass(Text.class);
    38         job.setOutputValueClass(IntWritable.class);
    39  
    40         for(int i = 0; i < otherArgs.length - 1; ++i) {
    41             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    42         }
    43  
    44         FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
    45         System.exit(job.waitForCompletion(true)?0:1);
    46     }
    47  
    48     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    49  
    50         public IntSumReducer() {
    51         }
    52  
    53         public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    54             int sum = 0;
    55             int count = 0;
    56             
    57             IntWritable val;
    58             for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get(),count++) {
    59                 val = (IntWritable)i$.next();
    60             }
    61             
    62             
    63             int average = (int)sum/count;
    64             context.write(key, new IntWritable(average));
    65         }
    66     }
    67  
    68     
    69     public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
    70  
    71         public TokenizerMapper() {
    72         }
    73  
    74         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
    75             StringTokenizer itr = new StringTokenizer(value.toString(), "
    ");
    76  
    77             while(itr.hasMoreTokens()) {
    78                 StringTokenizer iitr = new StringTokenizer(itr.nextToken());
    79                 String name = iitr.nextToken();
    80                 String score = iitr.nextToken();
    81                 context.write(new Text(name), new IntWritable(Integer.parseInt(score)));
    82             }
    83  
    84         }
    85     }
    86 }

    疑问

    在写这个的时候,我遇到个问题,就是输入输出文件的默认地址,为什么是user/hadoop/,我看了一下配置文件的信息,好像也没有出现过这个地址啊,希望有人能解答一下,万分感谢。

  • 相关阅读:
    PosegreSQL基础回顾(第 5 章 数据定义)
    PosegreSQL基础回顾(第 4 章 SQL语法)
    大数据学习2(伪分布式搭建)
    大数据学习2(MapReduce)
    大数据学习1(HDFS)
    Linux find用法
    shell循环
    查询一次数据库给多个变量赋值
    linux cut用法
    linux权限
  • 原文地址:https://www.cnblogs.com/zyb993963526/p/10468981.html
Copyright © 2020-2023  润新知