• mapreduce程序开发简单实例 WordCount


    mapreduce的简单编程已经学习得差不多了,抽时间总结下

      WordCount顾名思义,这个程序的作用就是数清一个文本中某关键词的出现次数,通过mapreduce可以轻松实现。

    首先输入的文本如下:

      

     然后目标就是统计各个卖家id 的出现次数

    原理:

    大致思路是将hdfs上的文本作为输入,MapReduce通过InputFormat会将文本进行切片处理,并将每行的首字母相对于文本文件的首地址的偏移量作为输入键值对的key,文本内容作为输入键值对的value,经过在map函数处理,输出中间结果<word,1>的形式,并在reduce函数中完成对每个单词的词频统计。整个程序代码主要包括两部分:Mapper部分和Reducer部分。

     代码实现:

    import java.io.IOException;
    import java.util.StringTokenizer;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    public class mapreduce {
    public static void main(String[] args) throws IOException,ClassNotFoundException,InterruptedException {
    Job job = Job.getInstance();
    job.setJobName("WordCount");
    job.setJarByClass(mapreduce.class);
    job.setMapperClass(doMapper.class);
    job.setReducerClass(doReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    Path in = new Path("hdfs://192.168.146.131:9000/mymapreduce1/in/buyer_favorite1");
    Path out = new Path("hdfs://192.168.146.131:9000/mymapreduce1/out");
    FileInputFormat.addInputPath(job,in);
    FileOutputFormat.setOutputPath(job,out);
    System.exit(job.waitForCompletion(true)?0:1);
    
    }
    public static class doMapper extends Mapper<Object,Text,Text,IntWritable>{
    public static final IntWritable one = new IntWritable(1);
    public static Text word = new Text();
    @Override
    protected void map(Object key, Text value, Context context)
    throws IOException,InterruptedException {
    StringTokenizer tokenizer = new StringTokenizer(value.toString(),"  ");
    word.set(tokenizer.nextToken());
    context.write(word,one);
    }
    }
    public static class doReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
    private IntWritable result = new IntWritable();
    @Override
    protected void reduce(Text key,Iterable<IntWritable> values,Context context)
    throws IOException,InterruptedException{
    int sum = 0;
    for (IntWritable value : values){
    sum += value.get();//汇总各个关键字数目,将每个key的values中所有值相加
    }
    result.set(sum);
    context.write(key,result);
    }
    }
    }
    

      

     最终到hdfs的输出目录(本例是/mymapreduce1/out)中查看输出的文件part-r-00000

    可得到

     

     

     

     

  • 相关阅读:
    dotnetcore3.1 WPF 实现多语言
    dotnetcore3.1 WPF 中使用依赖注入
    [svc]打通mysql主从同步
    [svc]glusterfs的简单部署
    [svc]inotify+rsync解决nfs单点问题
    [svc]rsync简单部署
    [svc]linux文件权限
    [svc]ssh批量分发key/批量用户管理
    [svc]NFS存储企业场景及nfs最佳实战探究
    [svc]mount命令及解决因/etc/fstab错误导致系统不能启动故障
  • 原文地址:https://www.cnblogs.com/liuleliu/p/14038505.html
Copyright © 2020-2023  润新知