mapreduce求平均数

现有某电商关于商品点击情况的数据文件，表名为goods_click，包含两个字段（商品分类，商品点击次数），分隔符“ ”，由于数据很大，所以为了方便统计我们只截取它的一部分数据，内容如下

52127   5
52120   93
52092   93
52132   38
52006   462
52109   28
52109   43
52132   0
52132   34
52132   9
52132   30
52132   45
52132   24
52009   2615
52132   25
52090   13
52132   6
52136   0
52090   10
52024   347

View Code

要求使用mapreduce统计出每类商品的平均点击次数。

源代码：

package mapreduce;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import mapreduce.WordCount.MyMapper;
import mapreduce.WordCount.MyReducer;

public class MyAverage {
    public static class Map extends Mapper<Object, Text, Text, IntWritable> {
        private static Text newKey = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());

            while (itr.hasMoreTokens()) {
                String line = itr.nextToken();
                String arr[]=line.split("   ");
                newKey.set(arr[0]);
                int click=Integer.parseInt((arr[1].trim()));
                context.write(newKey, new IntWritable(click));

            }
        }

    }

    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int num=0;
            int count=0;
            for(IntWritable val:values) {
                num+=val.get();
                count++;
            }
            int avg=num/count;
            context.write(key, new IntWritable(avg));
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        System.out.println("start");
        Job job = new Job(conf, "MyAverage");
        job.setJarByClass(MyAverage.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        Path in = new Path("hdfs://localhost:9000/mymapreduce3/in/goods_click");
        Path out = new Path("hdfs://localhost:9000/mymapreduce3/out");

        FileInputFormat.addInputPath(job, in);
        FileOutputFormat.setOutputPath(job, out);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

统计数据：

52006    462
52009    2615
52024    347
52090    11
52092    93
52109    35
52120    93
52127    5
52132    23
52136    0

遇到的问题：

1.原本给到的数据中第一列是“商家ID 点击次数”,但是在程序中无法将点击次数从“字符串”转换成“int"型。后来在元数据中去掉了这一行。

2.无法将”点击次数“的数据从String型转化成int型。刚开始发现获得的”点击次数“数据周围包含空格，然后用String.trim()去空格但是不管用。然后源数据中去掉空格。

猜想应该是从数据库导出时数据库中就保存的数据加空格吧。

作者：我是一个粉刷匠

出处：https://www.cnblogs.com/wl2017/

本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利.

相关阅读:
SpringCloud学习笔记(5)——Config
SpringCloud学习笔记(4)——Zuul
SpringCloud学习笔记(3)——Hystrix
SpringCloud学习笔记(2)——Ribbon
SpringCloud学习笔记(1)——Eureka
SpringCloud学习笔记——Eureka高可用
 Eureka介绍
 微服务网关 Spring Cloud Gateway
Spring Boot 参数校验
 Spring Boot Kafka
原文地址：https://www.cnblogs.com/wl2017/p/9978165.html