• Mapreduce 反向索引


    反向索引主要用于全文搜索,就是形成一个word url这样的结构
    file1:
    MapReduce is simple
    file2:
    MapReduce is powerful is simple
    file3:
    Hello MapReduce bye MapReduce
    那么经过反向索引后就是:
    Hello     file3.txt:1;
    MapReduce     file3.txt:2;fil1.txt:1;fil2.txt:1;
    bye     file3.txt:1; 
    is     fil1.txt:1;fil2.txt:2;
    powerful     fil2.txt:1;
    simple     fil2.txt:1;fil1.txt:1;
    主要的方法就是,对每个文件的内容进行遍历,形成的key为word+filename,value=1然后在combiner中将key相同的进行累加,这样就得到在同一个文件中word的字数了。最后在reduce中将filename进行分割即可。不过这里有个小的bug,一般来说combiner是在同一个节点上进行reduce,但是我这里却是用于统计同一个文件了,如果说文件很大,那么很有可能一个文件的内容会被分配到两个不同的节点上,那么就有会bug了。所以这里只能适合小的文件。
    PS:获得文件名String filename = ((FileSplit) context.getInputSplit()).getPath().getName();别的似乎没有了。
    public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
     
                     public void map(LongWritable ikey, Text ivalue, Context context)
                                                     throws IOException, InterruptedException {
                                    StringTokenizer st= new StringTokenizer(ivalue.toString());
                                    FileSplit split=new FileSplit();
                                    split = (FileSplit) context.getInputSplit();
                                    InputSplit isplit=context.getInputSplit();
                                    String filename = ((FileSplit) context.getInputSplit()).getPath().getName();
                                     while(st.hasMoreTokens()){
                                                     //int splitIndex = split.getPath().toString().indexOf("file");
                                                    String key=st.nextToken()+":" +filename;
                                                    context.write( new Text(key),new Text("1"));
                                    }
                    }
     
    }
     
     
    public class MyCombiner extends Reducer<Text, Text, Text, Text> {
     
                     public void reduce(Text _key, Iterable<Text> values, Context context)
                                                     throws IOException, InterruptedException {
                                     // process values
                                     int sum=0;
                                     for (Text val : values) {
                                                    sum++;
                                    }
                                    StringTokenizer st= new StringTokenizer(_key.toString(),":");
                                    String key=st.nextToken();
                                    String value=st.nextToken();
                                    value=value+ ":"+sum;
                                    context.write( new Text(key),new Text(value));
                    }
     
    }
     
     
    public class MyReducer extends Reducer<Text, Text, Text, Text> {
     
                     public void reduce(Text _key, Iterable<Text> values, Context context)
                                                     throws IOException, InterruptedException {
                                     // process values
                                    String filelist= new String();
                                     for (Text val : values) {
                                                    filelist=filelist+val.toString()+ ";  ";
                                    }
                                    context.write(_key, new Text(filelist));
                                     //System.out.println(_key.toString()+filelist);
                    }
     
    }
  • 相关阅读:
    JavaScript中的map()函数
    JS中去除字符串空白符
    JS中的reduce函数
    Javascript中中括号的几种形式
    Jquery点击加载更多
    百度编辑器的调用
    Newtonsoft.json 二次引用出错解决办法
    WebConfig配置文件
    jqGrid首次加载时不加载任何数据
    jqGrid列的统计
  • 原文地址:https://www.cnblogs.com/sunrye/p/4543365.html
Copyright © 2020-2023  润新知