• Maven构建Hadoop Maven构建Hadoop工程


    一.安装maven

    linux eclipse3.6.1 maven安装

    二:官网依赖库

      我们可以直接去官网查找我们需要的依赖包的配置pom,然后加到项目中。

      官网地址:http://mvnrepository.com/

    三:Hadoop依赖

      我们需要哪些Hadoop的jar包?

      做一个简单的工程,可能需要以下几个

    复制代码
    hadoop-common
    hadoop-hdfs
    hadoop-mapreduce-client-core
    hadoop-mapreduce-client-jobclient
    hadoop-mapreduce-client-common
    复制代码

    四:配置

      打开工程的pom.xml文件。根据上面我们需要的包去官网上找,找对应版本的,这么我使用的2.5.2版本。

      修改pom.xml如下:

    复制代码
    复制代码
    <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.5.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.5.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-core</artifactId>
                <version>2.5.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
                <version>2.5.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-common</artifactId>
                <version>2.5.2</version>
            </dependency>
            <dependency>
                <groupId>jdk.tools</groupId>
                <artifactId>jdk.tools</artifactId>
                <version>1.7</version>
                <scope>system</scope>
                <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
            </dependency>
        </dependencies>
    复制代码
    复制代码

    五:构建完毕

      点击保存,就会发现maven在帮我们吧所需要的环境开始构建了。

      等待构建完毕。

    六:新建WordCountEx类

      在src/main/java下新建WordCountEx类

    package firstExample;
    
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCountEx {
    	static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
    		private final static IntWritable one = new IntWritable(1);
    
    		private Text word = new Text();
    
    		protected void map(
    				Object key,
    				Text value,
    				org.apache.hadoop.mapreduce.Mapper<Object, Text, Text, IntWritable>.Context context)
    				throws java.io.IOException, InterruptedException {
    
    			// 分隔字符串
    			StringTokenizer itr = new StringTokenizer(value.toString());
    			while (itr.hasMoreTokens()) {
    				// 排除字母少于5个字
    				String tmp = itr.nextToken();
    				if (tmp.length() < 5) {
    					continue;
    				}
    				word.set(tmp);
    				context.write(word, one);
    			}
    
    		}
    
    	}
    
    	static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    		private IntWritable result = new IntWritable();
    		private Text keyEx = new Text();
    
    		protected void reduce(
    				Text key,
    				java.lang.Iterable<IntWritable> values,
    				org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
    				throws java.io.IOException, InterruptedException {
    			
    			int sum=0;
    			for (IntWritable val:values) {
    				//
    				sum+= val.get()*2;
    			}
    			
    			result.set(sum);
    			//自定义输出key
    			
    			keyEx.set("输出:"+key.toString());
    			context.write(keyEx, result);
    			
    		}
    	}
    	
    	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    		//配置信息
    		Configuration conf=new Configuration();
    		
    		//job的名称
    		Job job=Job.getInstance(conf,"mywordcount");
    		
    		job.setJarByClass(WordCountEx.class);
    		job.setMapperClass(MyMapper.class);
    		
    		job.setReducerClass(MyReduce.class);
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(IntWritable.class);
    		
    		//输入, 输出path
    		FileInputFormat.addInputPath(job, new Path(args[0]));
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    		
    		//结束
    		System.out.println(job.waitForCompletion(true)?0:1);
    		
    	}
    	
    	
    	
    
    }
    

      

    七:导出Jar包

      点击工程,右键->Export,如下:

    八:执行

      将导出的jar包放到C:UsershadoopDesktop下,而后上传的Linux中/home/hadoop/workspace/下

         上传world_ 01.txt , hadoop fs -put  /home/hadoop/workspace/words_01.txt   /user/hadoop

      执行命令,发现很顺利的就成功了

     hadoop jar /home/hadoop/workspace/first.jar firstExample.WordCountEx  /user/hadoop/world_ 01.txt  /user/hadoop/out

    结果为:

     

    示例下载

     Github:https://github.com/sinodzh/HadoopExample/tree/master/2015/first

  • 相关阅读:
    很多网络库介绍
    CFileFind
    C#编写COM组件
    使用javascript调用com组件
    C++ 解析Json——jsoncpp
    休眠与开机自动运行等VC代码
    win7 vs2012/2013 编译boost 1.55
    VC中的字符串转换宏
    InstallShield 静默安装
    CAD版本 注册表信息
  • 原文地址:https://www.cnblogs.com/nucdy/p/5815955.html
Copyright © 2020-2023  润新知