• Wordcount on YARN 一个MapReduce示例


    Hadoop YARN版本:2.2.0

    关于hadoop yarn的环境搭建可以参考这篇博文:Hadoop 2.0安装以及不停集群加datanode

    hadoop hdfs yarn伪分布式运行,有如下进程

    1320 DataNode
    1665 ResourceManager 1771 NodeManager 1195 NameNode 1487 SecondaryNameNode

    写一个mapreduce示例,在yarn上跑,wordcount数单词示例

    代码在github上:https://github.com/huahuiyang/yarn-demo

    步骤一

    我们要处理的输入如下,每行包含一个或多个单词,空格分开。可以用hadoop fs -put ... 把本地文件放到hdfs上去,方便mapreduce程序读取

    hadoop yarn
    mapreduce
    hello redis
    java hadoop
    hello world
    here we go

    wordcount程序希望完成数单词任务,输出格式是 <单词  出现次数>

    步骤二

    新建一个工程,工程结构如下,这个是个maven管理的工程

    源代码如下:

    pom.xml文件
    
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <groupId>hadoop-yarn</groupId>
        <artifactId>hadoop-demo</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-core</artifactId>
                <version>2.1.1-beta</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.1.1-beta</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-common</artifactId>
                <version>2.1.1-beta</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
                <version>2.1.1-beta</version>
            </dependency>
        </dependencies>
    </project>
    package com.yhh.mapreduce.wordcount;
    import java.io.IOException;
    
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    
    public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,IntWritable>  {
    
        @Override
        public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            
            String line = value.toString();
            if(line != null) {
                String[] words = line.split(" ");
                for(String word:words) {
                    output.collect(new Text(word), new IntWritable(1));
                }
            }
            
        }
    
    }
    package com.yhh.mapreduce.wordcount;
    
    import java.io.IOException;
    import java.util.Iterator;
    
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    
    public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
    
        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int count = 0;
            while(values.hasNext()) {
                values.next();
                count++;
            }
            output.collect(key, new IntWritable(count));
        }
    
    }
    package com.yhh.mapreduce.wordcount;
    
    import java.io.IOException;
    
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    
    public class WordCount {
        public static void main(String[] args) throws IOException {
            if(args.length != 2) {
                System.err.println("Error!");
                System.exit(1);
            }
            
            JobConf conf = new JobConf(WordCount.class);
            conf.setJobName("word count mapreduce demo");
            
            conf.setMapperClass(WordCountMapper.class);
            conf.setReducerClass(WordCountReducer.class);
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);
            
            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            
            JobClient.runJob(conf);
            
        }
    
    }

    步骤三

    打包发布成jar,右击java工程,选择Export...,然后选择jar file生成目录,这边发布成wordcount.jar,然后上传到hadoop集群

    [root@hadoop-namenodenew ~]# ll wordcount.jar 
    -rw-r--r--. 1 root root 4401 6月   1 22:05 wordcount.jar

    运行mapreduce任务。命令如下

    hadoop jar ~/wordcount.jar com.yhh.mapreduce.wordcount.WordCount data.txt /wordcount/result

    可以用hadoop job -list看任务运行情况,运行成功大概会有如下输出

    14/06/01 22:06:25 INFO mapreduce.Job: The url to track the job: http://hadoop-namenodenew:8088/proxy/application_1401631066126_0003/
    14/06/01 22:06:25 INFO mapreduce.Job: Running job: job_1401631066126_0003
    14/06/01 22:06:33 INFO mapreduce.Job: Job job_1401631066126_0003 running in uber mode : false
    14/06/01 22:06:33 INFO mapreduce.Job:  map 0% reduce 0%
    14/06/01 22:06:40 INFO mapreduce.Job:  map 50% reduce 0%
    14/06/01 22:06:41 INFO mapreduce.Job:  map 100% reduce 0%
    14/06/01 22:06:47 INFO mapreduce.Job:  map 100% reduce 100%
    14/06/01 22:06:48 INFO mapreduce.Job: Job job_1401631066126_0003 completed successfully
    14/06/01 22:06:49 INFO mapreduce.Job: Counters: 43

    然后mapreduce输出的任务结果如下,单词按照字典序排序

    hadoop fs -cat /wordcount/result/part-00000
    
    go    1
    hadoop    2
    hello    2
    here    1
    java    1
    mapreduce    1
    redis    1
    we    1
    world    1
    yarn    1
  • 相关阅读:
    ado异常代码含义对照表及SQL Access,oracle 数据类型对照表
    关于同花顺日数据格式
    把自己以前的QQ plan贴一下
    为应用程序制作帮助文件
    [临时]单源最短路径(Dijkstra算法)
    [IDA] 分析for循环的汇编代码
    对 strlen 汇编代码的解释
    [VC6] 在对话框上实现LOGO图片的渐变性切换效果
    [C++]拼图游戏
    使用 ADO 向数据库中存储一张图片
  • 原文地址:https://www.cnblogs.com/yanghuahui/p/3763820.html
Copyright © 2020-2023  润新知