首先 ,我用java程序写了个产生温度数据的txt文件。
import java.io.*; import java.util.Random; public class CreateFile { /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub BufferedWriter bw = null; Random ran=new Random(); int id=0; double temp=0; long count=10000; try { bw=new BufferedWriter(new FileWriter(new File("sensor.txt"))); for(long i=0;i<count;i++){ id=ran.nextInt(1000); temp=ran.nextDouble()*100; bw.write(id+" "+temp+"\n"); } bw.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
这样产生的文件比较小,作为例子示范。前面产生了2g的txt,结果打开时,就死机了,为了验证产生的文件数据个数对不对,又写了个计算文件行数的程序。
import java.io.*; import java.util.Random; public class CalcLineNum { /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub BufferedReader br = null; Random ran=new Random(); int id=0; double temp=0; long count=0; try { br=new BufferedReader(new FileReader(new File("sensor.txt"))); while(br.readLine()!=null){ count++; System.out.println(count); } br.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
这样,实例文件应该是没问题了。接下来在eclipse里写MapReduce。
package com.xioyaozi; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class MaxTemp { public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,IntWritable,DoubleWritable>{ private final static IntWritable one=new IntWritable(1); private Text word = new Text(); public void map(LongWritable key,Text value,OutputCollector<IntWritable,DoubleWritable> output,Reporter reporter) throws IOException{ String line=value.toString(); String[] str=line.split(" "); int id=Integer.parseInt(str[0]); double temp=Double.parseDouble(str[1]); if(id<=1000&&id>=0&&temp<100&&temp>=0) output.collect(new IntWritable(id),new DoubleWritable(temp)); } } public static class Reduce extends MapReduceBase implements Reducer<IntWritable, DoubleWritable, IntWritable, DoubleWritable> { public void reduce(IntWritable key, Iterator<DoubleWritable> values, OutputCollector<IntWritable, DoubleWritable> output, Reporter reporter) throws IOException { double maxTemp=0; while (values.hasNext()) { maxTemp = Math.max(maxTemp, values.next().get()); } output.collect(key, new DoubleWritable(maxTemp)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(MaxTemp.class); conf.setJobName("maxTemp"); conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(DoubleWritable.class); conf.setMapperClass(Map.class); // conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
产生的文件是正确的。至此,MapReduce已基本初步掌握。
备注:使用eclipse时,注意设置运行参数,并且记得第二次运行程序时,要将结果文件删除。