• MapReduce程序的开发过程


    1. 在linux(虚拟机环境)下安装hadoop2.8.3

    1.1 安装JDK环境

    1.2 安装hadoop

    1.3 进行配置:core-site.xml,hdfs-site.xml设置

    1.4 初始化namenode

    1.5 启动dfs和yarn

    2. 在主机WIN10下安装STS及maven

    3. 在将linux下hadoop目录全部拷贝到WIN10下,设置HADOOP-HOME环境变量,并将HADOOP-HOME/bin加入PATH

    4. 将hadoop-eclipse-plugin-2.8.3插件拷贝到STS的plugin目录下,并将winutils.exe放入win10中hadoop/bin目录下,将hadoop.dll加入到windows/system32目录下

    5. 启动STS安装hadoop-eclipse-plugin-2.8.3插件(在sts中设置hadoop的安装目录,并建立一个linux下hadoop服务器的实例,设置其dfs server的IP和port),可以看到所有hadoop的节点内目录

    6. 生成一个mapreduce项目,在项目的src中新建一个wordcount.java文件

    7. 将代码加入其中:

    package helloWordCount;

    public class WordCount {

    public static class TokenizerMapper
    extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
    ) throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());
    while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    context.write(word, one);
    }
    }
    }

    public static class IntSumReducer
    extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
    Context context
    ) throws IOException, InterruptedException {
    int sum = 0;
    for (IntWritable val : values) {
    sum += val.get();
    }
    result.set(sum);
    context.write(key, result);
    }
    }

    public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.addResource("../core-site.xml");//将hadoop的设置导入,这样就不会出现找不到目录的情况了
    conf.addResource("../hdfs-site.xml");//将hadoop的设置导入,这样就不会出现找不到目录的情况了
    //String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    String[] otherArgs = new String[] {"/input", "/output"};
    if (otherArgs.length < 2) {
    System.err.println("Usage: wordcount <in> [<in>...] <out>");
    System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
    FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
    new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
    }

    8.完成后,右击文件wordcount.java,点击Run As-》run on hadoop运行

    第一次运行时出现找不到指定的文件夹的异常,其原因是没有将hadoop的基本设置导入,这时需要加入上面带有下划线的两行,一切OK!

  • 相关阅读:
    day04
    day02
    day01
    ORM + 单例
    ORM框架SQLAlchemy
    存储引擎 , 索引 ,慢日志查询 , explain查询优化, 权限管理
    事务,视图 ,函数,存储过程,触发器
    pymysql 操作 , sql注入
    外键,高级操作
    mysql 基本操作
  • 原文地址:https://www.cnblogs.com/myboat/p/11642062.html
Copyright © 2020-2023  润新知