8.Mapreduce实例——倒排索引

Mapreduce实例——倒排索引

实验步骤

1.开启Hadoop

2.新建mapreduce9目录

在Linux本地新建/data/mapreduce9目录

3. 上传文件到linux中

（自行生成文本文件，放到个人指定文件夹下）

goods3

1024600 6 52006 0

1024593 1 52121 0

1024592 1 52121 0

1024590 1 52119 0

1024589 1 52119 0

1024588 1 52030 0

1024587 1 52021 0

1024586 1 52029 0

1024585 1 52014 0

1024584 1 52029 0

goods_visit3

1024600 2

1024593 0

1024592 0

1024590 0

1024589 0

1024588 0

1024587 0

1024586 0

1024585 0

1024584 0

order_items3

251688 52107 1024600 1 31.6 31.6 15.8

252165 52209 1024600 1 31.6 31.6 15.8

251870 52146 1024481 1 15.6 15.6 7.8

251935 52158 1024481 1 15.6 15.6 7.8

252415 52264 1024480 1 69.0 69.0 69.0

250983 51937 1024480 1 69.0 69.0 69.0

252609 52299 1024480 1 69.0 69.0 69.0

251689 52107 1024440 1 31.6 31.6 15.8

239369 49183 1024256 1 759.0 759.0 759.0

249222 51513 1024140 1 198.0 198.0 198.0

4.在HDFS中新建目录

首先在HDFS上新建/mymapreduce9/in目录，然后将Linux本地/data/mapreduce9目录下的goods3,goods_visit3和order_items3文件导入到HDFS的/mymapreduce9/in目录中。

hadoop fs -mkdir -p /mymapreduce9/in

hadoop fs -put /root/data/mapreduce9/goods3 /mymapreduce9/in

hadoop fs -put /root/data/mapreduce9/order_items3 /mymapreduce9/in

5.新建Java Project项目

新建Java Project项目，项目名为mapreduce。

在mapreduce项目下新建包，包名为mapreduce8。

在mapreduce8包下新建类，类名为MyIndex。

6.添加项目所需依赖的jar包

右键项目，新建一个文件夹，命名为：hadoop2lib,用于存放项目所需的jar包。

将/data/mapreduce2目录下，hadoop2lib目录中的jar包，拷贝到eclipse中mapreduce2项目的hadoop2lib目录下。

hadoop2lib为自己从网上下载的，并不是通过实验教程里的命令下载的

选中所有项目hadoop2lib目录下所有jar包，并添加到Build Path中。

7.编写程序代码

MyIndex.java

package mapreduce8;

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MyIndex {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Job job = Job.getInstance();
        job.setJobName("InversedIndexTest");
        job.setJarByClass(MyIndex.class);

        job.setMapperClass(doMapper.class);
        job.setCombinerClass(doCombiner.class);
        job.setReducerClass(doReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        Path in1 = new Path("hdfs://192.168.109.10:9000/mymapreduce9/in/goods3");
        Path in2 = new Path("hdfs://192.168.109.10:9000/mymapreduce9/in/goods_visit3");
        Path in3 = new Path("hdfs://192.168.109.10:9000/mymapreduce9/in/order_items3");
        Path out = new Path("hdfs://192.168.109.10:9000/mymapreduce9/out");

        FileInputFormat.addInputPath(job, in1);
        FileInputFormat.addInputPath(job, in2);
        FileInputFormat.addInputPath(job, in3);
        FileOutputFormat.setOutputPath(job, out);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

    public static class doMapper extends Mapper<Object, Text, Text, Text>{
        public static Text myKey = new Text();
        public static Text myValue = new Text();
        //private FileSplit filePath;

        @Override
        protected void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String filePath=((FileSplit)context.getInputSplit()).getPath().toString();
            if(filePath.contains("goods")){
                String val[]=value.toString().split("\t");
                int splitIndex =filePath.indexOf("goods");
                myKey.set(val[0] + ":" + filePath.substring(splitIndex));
            }else if(filePath.contains("order")){
                String val[]=value.toString().split("\t");
                int splitIndex =filePath.indexOf("order");
                myKey.set(val[2] + ":" + filePath.substring(splitIndex));
            }
            myValue.set("1");
            context.write(myKey, myValue);
        }
    }
    public static class doCombiner extends Reducer<Text, Text, Text, Text>{
        public static Text myK = new Text();
        public static Text myV = new Text();

        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0 ;
            for (Text value : values) {
                sum += Integer.parseInt(value.toString());
            }
            int mysplit = key.toString().indexOf(":");
            myK.set(key.toString().substring(0, mysplit));
            myV.set(key.toString().substring(mysplit + 1) + ":" + sum);
            context.write(myK, myV);
        }
    }

    public static class doReducer extends Reducer<Text, Text, Text, Text>{

        public static Text myK = new Text();
        public static Text myV = new Text();

        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {

            String myList = new String();

            for (Text value : values) {
                myList += value.toString() + ";";
            }
            myK.set(key);
            myV.set(myList);
            context.write(myK, myV);
        }
    }
}

8.运行代码

在MyIndex类文件中，右键并点击=>Run As=>Run on Hadoop选项，将MapReduce任务提交到Hadoop中。

9.查看实验结果

待执行完毕后，进入命令模式下，在HDFS中/mymapreduce9/out查看实验结果。

hadoop fs -ls /mymapreduce9/out

hadoop fs -cat /mymapreduce9/out/part-r-00000

图一为我的运行结果，图二为实验结果

经过对比，发现结果一样

实验结果如下图：

商品id 所在表名称：出现次数

1024140 order_items3:1;

1024256 order_items3:1;

1024440 order_items3:1;

1024480 order_items3:3;

1024481 order_items3:2;

1024584 goods3:1;goods_visit3:1;

1024585 goods_visit3:1;goods3:1;

1024586 goods3:1;goods_visit3:1;

1024587 goods_visit3:1;goods3:1;

1024588 goods3:1;goods_visit3:1;

1024589 goods_visit3:1;goods3:1;

1024590 goods3:1;goods_visit3:1;

1024592 goods_visit3:1;goods3:1;

1024593 goods3:1;goods_visit3:1;

1024600 goods_visit3:1;goods3:1;order_items3:2;

此处为浏览器截图

相关阅读:
Cocostudio学习笔记（2） Button + CheckBox
Oracle会话及连接数优化
 linux zip压缩和解压的各种操控
 Linux select 机制深入分析
 算法的时间复杂度
 findmaven的英文版本号上线了
 XML高速入门
 spring xml properties split with comma for list
There is an error in invoking javac. A full JDK (not just JRE) is required
[Swift]LeetCode134. 加油站 | Gas Station
原文地址：https://www.cnblogs.com/wangdayang/p/15582095.html