• Hadoop 1.2.1 MapReduce 例子


    自学hadoop真的很困难,主要是hadoop版本太混乱了,各个版本之间兼容性并不算太好。更主要的是网上的很多MapReduce的Java例子不写import!!!只写类名!!!偏偏Hadoop中有很多重名的类,不写Import根本不知道是哪个类!!!而且也不写上hadoop的版本号!!!让人根本看不明白!!!

    所以这里我写下所有要注意的情况,特别要注意import的是哪一个类!!!

    环境: hadoop1.2.1+jdk1.7+eclipse4.5+maven

    maven的pom文件是:(如果不知道maven,那得稍微看看maven是什么)

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>com.howso</groupId>
      <artifactId>hadoopmaven</artifactId>
      <version>0.0.1-SNAPSHOT</version>
      <name>hadoopmaven</name>
      <properties>
          <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
          <hadoop.version>1.2.1</hadoop.version>
      </properties>
      <dependencies>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-client</artifactId>
              <version>${hadoop.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-core</artifactId>
              <version>${hadoop.version}</version>
          </dependency>
          <dependency>
              <groupId>org.hamcrest</groupId>
              <artifactId>hamcrest-all</artifactId>
              <version>1.1</version>
              <scope>test</scope>
          </dependency>
          <dependency>
              <groupId>junit</groupId>
              <artifactId>junit</artifactId>
              <version>4.11</version>
              <scope>test</scope>
          </dependency>
          <dependency>
              <groupId>org.apache.mrunit</groupId>
              <artifactId>mrunit</artifactId>
              <version>1.1.0</version>
              <classifier>hadoop2</classifier>
              <scope>test</scope>
          </dependency>
          
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-minicluster</artifactId>
              <version>${hadoop.version}</version>
              <scope>test</scope>
          </dependency>
          
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-test</artifactId>
              <version>${hadoop.version}</version>
          </dependency>
          <dependency>
              <groupId>com.sun.jersey</groupId>
              <artifactId>jersey-core</artifactId>
              <version>1.8</version>
              <scope>test</scope>
          </dependency>
          
      </dependencies>
      <build>
          <finalName>hadoopx</finalName>
          <plugins>
              <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-compilter-plugin</artifactId>
                  <version>3.1</version>
                  <configuration>
                      <source>1.6</source>
                      <target>1.6</target>
                  </configuration>
              </plugin>
              <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-jar-plugin</artifactId>
                  <version>2.5</version>
                  <configuration>
                      <outputDirectory>basedir</outputDirectory>
                      <archive>
                          <manifest>
                              <mainClass>hadoopmaven.Driver</mainClass>
                          </manifest>
                      </archive>
                  </configuration>
              </plugin>
          </plugins>
      </build>
    </project>
    pom.xml

    这里面有一些组件是用来写hadoop的test的:mrunit,hadoop-test。

    总共有3个类:Driver, MaxMapper, MaxReducer。 这三个类合力来获得每年最大的温度。这三个类都在hadoopmaven包下面。

    一定要注意import的是哪个类,hadoop中相同的名字的类不少,特别是Mapper,Reducer这两个,竟然都有相同名称的,一定要注意。

    Driver类:

    package hadoopmaven;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    public class Driver extends Configured implements Tool{
        
        //这个能够运行成功
        public static void main(String[] args) throws Exception {
            int k =ToolRunner.run(new Driver(), args);
            System.out.println("ks is : "+k);
            System.exit(k);
        }
        
        
        public int run(String[] arg0) throws Exception {
            
            Job job = new Job(getConf(), "word count");
            job.setJarByClass(getClass());
            job.setJarByClass(Driver.class);
            job.setMapperClass(MaxMapper.class);
            job.setCombinerClass(MaxReducer.class);
            job.setReducerClass(MaxReducer.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            
            FileInputFormat.addInputPath(job, new Path("/input/temp.txt"));
            
            FileOutputFormat.setOutputPath(job, new Path("/output4"));
            
            return job.waitForCompletion(true)?0:1;
        }
    
    }
    hadoopmaven.Driver

    MaxMapper类:

    package hadoopmaven;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public  class MaxMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        //输入的格式是
        // 1991,90
        // 1991,91
        // 1993,98
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            String[] line=value.toString().split(",");
            context.write(new Text(line[0]), new IntWritable(Integer.parseInt(line[1])));
        }
        
    }
    hadoopmaven.MaxMapper

    MaxReducer类:

    package hadoopmaven;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    public  class MaxReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    
        @Override
        protected void reduce(Text arg0, Iterable<IntWritable> arg1,
                Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int max=Integer.MIN_VALUE;
            for(IntWritable v:arg1){
                max=Math.max(max, v.get());
            }
            context.write(arg0, new IntWritable(max));
        }
        
        
    }
    hadoopmaven.MaxReducer

    这个MapReduce任务的作用是从hdfs的 /input/temp.txt文件中读取信息(/input/temp.txt的文件格式如下),获得每个年份对应的最大的数值,放到/output4文件夹中去。

    1991,33
    1991,45
    1992,94
    1992,85
    1992,5
    1993,78
    1993,75
    /input/temp.txt

    最后用maven的clean package打个包,maven会自动在打好的jar包中写上main class(因为在pom文件中配置了main class的名称了),打好的jar包在项目根目录下的basedir目录中,名字叫做hadoopx.jar(这些都是在pom中配置的。)

    把temp.txt文件放入hdfs中去,把hadoopx.jar放入hadoop根目录,进入hadoop根目录,使用命令 bin/hadoop jar hadoopx.jar 运行

  • 相关阅读:
    Android之快速搭建应用框架
    oracle hints merge 视图合并
    十年数据架构经验,告诉你业务化大数据中台最核心的四点
    Cinder LVM Oversubscription in thin provisioning
    Oracle 20c数据库开启原生的区块链表、AutoML以及持久化内存支持
    学习三十五
    学习三十五
    认知类和对象的关系
    认知类和对象的关系
    认知类和对象的关系
  • 原文地址:https://www.cnblogs.com/formyjava/p/5219191.html
Copyright © 2020-2023  润新知