在Windows环境上搭建Hadoop环境需要安装jdk1.7或以上版本.有了jdk之后,就可以进行Hadoop的搭建.
首先下载所需要的包:
1. Hadoop包: hadoop-2.5.2.tar.gz
2. Eclipse插件: hadoop-eclipse-plugin-2.5.2.jar
3. Hadoop在Windows运行插件包: hadooponwindows-master.zip
4. 测试数据: 1901和1902年天气预报文件
以上文件下载链接: https://pan.baidu.com/s/1R9qFdFDWHN1NnCW83VQiJg 密码: lkpp
将以上的文件都下载下来之后,进行Hadoop的安装.
第一步: 安装hadoop
1. 将下载的 hadoop-2.5.2.tar.gz 解压到指定目录, 例如我的就是放在 C:hadoop, 一下所有的例子都以该目录为标准
2. 配置Hadoop环境变量
2. 修改Hadoop配置文件
2.1 编辑 %HADOOP_HOME%etchadoop 下的core-site.xml文件, 加入以下内容
1 <configuration> 2 <property> 3 <name>hadoop.tmp.dir</name> 4 <value>/C:/hadoop/hadoop-2.5.2/workplace/tmp</value> 5 </property> 6 <property> 7 <name>dfs.name.dir</name> 8 <value>/C:/hadoop/hadoop-2.5.2/workplace/name</value> 9 </property> 10 <property> 11 <name>fs.default.name</name> 12 <value>hdfs://localhost:9000</value> 13 </property> 14 </configuration>
2.2 编辑 %HADOOP_HOME%etchadoop 下的mapred-site.xml文件, 加入以下内容
1 <configuration> 2 <property> 3 <name>mapreduce.framework.name</name> 4 <value>yarn</value> 5 </property> 6 <property> 7 <name>mapred.job.tracker</name> 8 <value>hdfs://localhost:9001</value> 9 </property> 10 </configuration>
2.3 编辑 %HADOOP_HOME%etchadoop 下的hdfs-site.xml文件, 加入以下内容
1 <configuration> 2 <!-- 这个参数设置为1,因为是单机版hadoop --> 3 <property> 4 <name>dfs.replication</name> 5 <value>1</value> 6 </property> 7 <property> 8 <name>dfs.data.dir</name> 9 <value>/C:/hadoop/hadoop-2.5.2/workplace/data</value> 10 </property> 11 </configuration>
2.4 编辑 %HADOOP_HOME%etchadoop 下的yarn-site.xml文件, 加入以下内容
1 <configuration> 2 <property> 3 <name>yarn.nodemanager.aux-services</name> 4 <value>mapreduce_shuffle</value> 5 </property> 6 <property> 7 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 8 <value>org.apache.hadoop.mapred.ShuffleHandler</value> 9 </property> 10 </configuration>
2.5 编辑 %HADOOP_HOME%etchadoop 下的hadoop-env.cmd文件,将JAVA_HOME用 @rem注释掉,编辑为本机JAVA_HOME的路径,然后保存
3. 配置Hadoop在Windows上的运行环境
将下载的 hadooponwindows-master.zip 解压, 并将bin目录下的所有文件替换到 %HADOOP_HOME%in 目录下
4. DOM窗口运行以下命令:
hdfs namenode -format
5. DOM窗口切换到 %HADOOP_HOME%sbin 目录, 可以进行Hadoop的启动和停止
1 启动: start-all.cmd 2 停止: stop-all.cmd
5.1 运行 start-all.cmd 如果出现类似于以下界面说明Hadoop在Windows上部署成功
6. 根据 core-site.xml 的配置, 接下来就可以通过:hdfs://localhost:9000 来对hdfs进行操作了
6.1 创建输入目录
1 hadoop fs -mkdir hdfs://localhost:9000/user/ 2 hadoop fs -mkdir hdfs://localhost:9000/user/input
6.2 上传测试数据到目录
1 hadoop fs -put C:hadoopdata1901 hdfs://localhost:9000/user/input 2 hadoop fs -put C:hadoopdata1902 hdfs://localhost:9000/user/input
6.3 查看上传上去的文件
1 hadoop fs -ls hdfs://localhost:9000/user/input
出现以下界面说明上传成功
7. 安装Eclipse插件
7.1 将下载的 hadoop-eclipse-plugin-2.5.2.jar 文件放到Eclipse安装目录下的plugins下, 重启Eclipse
7.2 点击菜单栏 Windows–>Preferences ,如果插件安装成功,就会出现如下图
7.3 配置Hadoop安装目录
7.4 调出 Map/Reduce 视图
7.5 点击 Map/Redure Locations 窗口,空白处右键New Hadoop location
7.6 填写参数,连接参数, 然后 Finish
8. 编写测试类:
8.1 创建Map/Redure Project
右键 –> New –> Other –> Map/Redure Project
8.2 编写测试代码
1 package hadoop.code01.maxtemperature; 2 3 import java.io.IOException; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.fs.Path; 7 import org.apache.hadoop.io.IntWritable; 8 import org.apache.hadoop.io.LongWritable; 9 import org.apache.hadoop.io.Text; 10 import org.apache.hadoop.mapreduce.Job; 11 import org.apache.hadoop.mapreduce.Mapper; 12 import org.apache.hadoop.mapreduce.Reducer; 13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 14 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 15 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 16 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 17 import org.apache.log4j.BasicConfigurator; 18 19 public class MaxTemperature { 20 21 public static class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { 22 23 private static final Integer MISSING = 9999; 24 25 @Override 26 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 27 String line = value.toString(); 28 String year = line.substring(15, 19); 29 System.out.println("line: " + line); 30 System.out.println("year: " + year); 31 Integer air; 32 if (line.charAt(87) == '+') { 33 air = Integer.parseInt(line.substring(88, 92)); 34 } else { 35 air = Integer.parseInt(line.substring(87, 92)); 36 } 37 String quality = line.substring(92, 93); 38 System.out.println("quality: " + quality); 39 if (!MISSING.equals(air) && quality.matches("[01459]")) { 40 context.write(new Text(year), new IntWritable(air)); 41 } 42 } 43 } 44 45 public static class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 46 47 @Override 48 public void reduce(Text key, Iterable<IntWritable> values, Context context) 49 throws IOException, InterruptedException { 50 Integer maxValue = Integer.MIN_VALUE; 51 System.out.println("maxValue0: " + maxValue); 52 for (IntWritable value : values) { 53 System.out.println("maxValue1: " + maxValue); 54 maxValue = Math.max(maxValue, value.get()); 55 } 56 context.write(key, new IntWritable(maxValue)); 57 } 58 } 59 60 public static class Temperature { 61 62 public static void main(String[] args) throws Exception, ClassNotFoundException, InterruptedException { 63 BasicConfigurator.configure(); // 自动快速地使用缺省Log4j环境。 64 System.out.println("kaishi..."); 65 if (args.length != 2) { 66 System.err.println("Usage: MaxTemperature <Input path> <Output path>"); 67 System.exit(-1); 68 } 69 Configuration conf = new Configuration(); 70 Job job = new Job(conf); 71 72 job.setJarByClass(MaxTemperature.class); 73 job.setJobName("maxTemperature"); 74 75 job.setOutputKeyClass(Text.class); 76 job.setOutputValueClass(IntWritable.class); 77 78 job.setMapperClass(MaxTemperatureMapper.class); 79 job.setReducerClass(MaxTemperatureReducer.class); 80 81 job.setInputFormatClass(TextInputFormat.class); 82 job.setOutputFormatClass(TextOutputFormat.class); 83 84 FileInputFormat.addInputPath(job, new Path(args[0])); 85 FileOutputFormat.setOutputPath(job, new Path(args[1])); 86 87 job.waitForCompletion(true); 88 89 System.out.println("jieshu..."); 90 System.exit(job.waitForCompletion(true) ? 0 : 1); 91 } 92 } 93 94 }
8.3 执行测试
1 Run As –> Run Configurations
8.4 点击 Run 运行, 然后在DOM窗口执行查看输出结果
1 hadoop fs -ls hdfs://localhost:9000/user/output
8.5 执行 hadoop fs -cat hdfs://localhost:9000/user/output/part-r-00000 查看算法执行结果数据
至此, 第一个Hadoop例子执行成功