• 编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行


    今天主要来说说怎么在Hadoop2.2.0分布式上面运行写好的 Mapreduce 程序。

    可以在eclipse写好程序,export或用fatjar打包成jar文件。

    先给出这个程序所依赖的Maven包:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>Temperature</groupId>
      <artifactId>Temperature</artifactId>
      <version>0.0.1-SNAPSHOT</version>
      <build>
        <sourceDirectory>src</sourceDirectory>
        <plugins>
          <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
              <source>1.7</source>
              <target>1.7</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
      <dependencies>
    
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-core</artifactId>
          <version>2.2.0</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>2.2.0</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-common</artifactId>
          <version>2.2.0</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
          <version>2.2.0</version>
      </dependency>
      </dependencies>
    </project>

    好了,现在给出程序,代码如下:

    Mapper

    package org.ccnt.mr;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reporter;
    
    public class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable> {
    
        private static final int MISSING = 9999;
    
        @Override
        public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            String line = value.toString();
            String year = line.substring(15, 19);
            int airTemperature;
            if (line.charAt(87) == '+')
                airTemperature = Integer.parseInt(line.substring(88, 92));
            else
                airTemperature = Integer.parseInt(line.substring(87, 92));
            String quality = line.substring(92, 93);
            if (airTemperature != MISSING && quality.matches("[01459]")) {
                output.collect(new Text(year), new IntWritable(airTemperature));
            }
        }
    
    }

    Reducer:

    package org.ccnt.mr;
    
    import java.io.IOException;
    import java.util.Iterator;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    
    public class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {
    
        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int maxValue = Integer.MIN_VALUE;
            while (values.hasNext()) {
                maxValue = Math.max(maxValue, values.next().get());
            }
            output.collect(key, new IntWritable(maxValue));
        }
    
    }

    Main

    package org.ccnt.mr;
    
    import java.io.IOException;
    
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    
    public class MaxTemperature {
        
        public static void main(String[] args) throws IOException {
            System.out.println(args.length);
            for (String string : args) {
                System.out.println(string);
            }
            if (args.length != 2) {
                System.err.println("Error");
                System.exit(1);
            }
            
            JobConf conf = new JobConf(MaxTemperature.class);
            conf.setJobName("Max Temperature");
            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            conf.setMapperClass(Map.class);
            conf.setReducerClass(Reduce.class);
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);
            JobClient.runJob(conf);
        }
    
    }

    将上面的程序编译和打包成jar文件,然后开始在Hadoop2.2.0(本文假定用户都部署好了Hadoop2.2.0)上面部署了。下面主要讲讲如何去部署: 
    首先,启动Hadoop2.2.0,命令如下:

    sbin/start-dfs.sh
    sbin/start-yarn.sh

    打包编译jar文件有两种方式:

    1)直接用export导出jar包,生成默认的MANIFEST.MF文件,不需要写main方法所在的类

    使用的命令:

    bin/hadoop jar ~/Downlowd/MaxTemperature.jar org.ccnt.mr.MaxTemperature input/data.txt result

    2)用Fat jar工具导出jar包,不需要导出依赖的(hadoop环境有),其实也就是MANIFEST.MF文件有了main方法所在的类。

    bin/hadoop jar ~/Download/Temperature input/data.txt result2

    结果是一样的。

    附程序测试的数据的下载地址:http://pan.baidu.com/s/1iSacM

     

    Reference:

    [原]编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行

  • 相关阅读:
    log4j输出信息到mongodb
    mongodb日志服务器方案
    mongodb的高级操作(聚合框架)
    mongdb高级操作(group by )
    mongodb的优化
    mongodb集成spring
    mongodb的固定集合(优化效率)
    mongodb的查询操作符
    mongoDB中的连接池(转载)
    mongodb在java驱动包下的操作(转)
  • 原文地址:https://www.cnblogs.com/549294286/p/3593573.html
Copyright © 2020-2023  润新知