使用idea操作mapreduce。进行计算
在文章: HADOOP之HDFS用idea操作(五) 基础之上进行
引入mapred-site.xml、yarn-site.xml
因是root启动,所以需要修改hdfs-site.xml
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
pom增加
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.5</version> </dependency>
编写类MyWordCount
package com.xiaoke.mapreduce.wc; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class MyWordCount { public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(true);
//点击Job进去,按照example写 Job job = Job.getInstance(configuration); job.setJarByClass(MyWordCount.class); // Specify various job-specific parameters job.setJobName("xiaokeke1"); Path inputPath = new Path("/data/wc/input"); TextInputFormat.setInputPaths(job, inputPath); Path outputPath = new Path("/data/wc/output"); if (outputPath.getFileSystem(configuration).exists(outputPath)) outputPath.getFileSystem(configuration).delete(outputPath, true); TextOutputFormat.setOutputPath(job, outputPath); job.setMapperClass(MyMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(MyReducer.class); job.waitForCompletion(true); } }
MyMapper类:
package com.xiaoke.mapreduce.wc; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.StringTokenizer; public class MyMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); /* hello hadoop 1 hello hadoop 2 hello hadoop 3 */ public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
MyReducer类:
package com.xiaoke.mapreduce.wc; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); /* hello 1 hello 1 hello 1 hadoop 1 hadoop 1 hadoop 1 1 1 2 1 以组为单位 */ public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
打包:
maven ->> clean & package
上传:
hadoop-hdfs-1.0-SNAPSHOT.jar
运行程序:需要指定包名
hadoop jar hadoop-hdfs-1.0-SNAPSHOT.jar com.xiaoke.mapreduce.wc.MyWordCount
运行结果:
查看计算结果:
hdfs dfs -cat /data/wc/output/part-r-00000
注意点:
- 当windows环境变量修改了之后,需要重新启动idea
以上的为线上环境发布方式
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
本地启动需要配置:
1,在win的系统中部署hadoop: C:usrhadoop-2.6.5hadoop-2.6.5 2,将hadoop资料中hadoop-installsoftin 文件覆盖到部署到bin目录下 还要将hadoop.dll 复制到 c:windwossystem32 3,设置环境变量:HADOOP_HOME C:usrhadoop-2.6.5hadoop-2.6.5 4. 重启idea
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
本地idea集群测试方式
//让框架知道是windows异构平台运行 configuration.set("mapreduce.app-submission.cross-platform","true"); //需要先打包 job.setJar("D:\code\mayun_hadoop\test\hadoop\target\hadoop-hdfs-1.0-SNAPSHOT.jar");
本地idea单机测试方式,跑的最快,hdfs和上直接有结果
1.注掉setJar 2. //让框架知道是windows异构平台运行 configuration.set("mapreduce.app-submission.cross-platform","true"); 3. configuration.set("mapreduce.framework.name", "local");
动态参数设置进conf中:
//工具类帮我们把-D 等等的属性直接set到conf GenericOptionsParser parser = new GenericOptionsParser(configuration, args); String[] othargs = parser.getRemainingArgs();
代码: https://gitee.com/Xiaokeworksveryhard/big-data.git