• Python & MapReduce


    使用Python实现Hadoop MapReduce程序

     

    原文请参考:

    http://blog.csdn.net/zhaoyl03/article/details/8657031/

    下面只是将mapper.py和reducer.py在windows上运行了一遍,没有用Hadoop的环境去测试。

    环境准备:

    1. Window 7 – 32
    2. 安装GunWin32,使得Linux命令可以在cmd上执行
    3. 安装IDLE (Python GUI),使得Python脚本可以执行
    4. 将Python的安装路径添加到windows的环境变量中,使得在cmd窗口中切换到Python脚本所在目录时,通过输入脚本名,可以直接执行Python脚本

    我的Python安装在: C:Python27python.exe下

    测试脚本放在: E:PythonTest下

    windows环境变量中增加:C:Python27

    mapper.py :

     

    #!/usr/bin/env python  
      
    import sys  
      
    # input comes from STDIN (standard input)  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
        # split the line into words  
        words = line.split()  
        # increase counters  
        for word in words:  
            # write the results to STDOUT (standard output);  
            # what we output here will be the input for the  
            # Reduce step, i.e. the input for reducer.py  
            #  
            # tab-delimited; the trivial word count is 1  
            print '%s	%s' % (word, 1)  

     

     

    reducer.py :

     

    #!/usr/bin/env python  
      
    from operator import itemgetter  
    import sys  
      
    current_word = None  
    current_count = 0  
    word = None  
      
    # input comes from STDIN  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
      
        # parse the input we got from mapper.py  
        word, count = line.split('	', 1)  
      
        # convert count (currently a string) to int  
        try:  
            count = int(count)  
        except ValueError:  
            # count was not a number, so silently  
            # ignore/discard this line  
            continue  
      
        # this IF-switch only works because Hadoop sorts map output  
        # by key (here: word) before it is passed to the reducer  
        if current_word == word:  
            current_count += count  
        else:  
            if current_word:  
                # write result to STDOUT  
                print '%s	%s' % (current_word, current_count)  
            current_count = count  
            current_word = word  
      
    # do not forget to output the last word if needed!  
    if current_word == word:  
        print '%s	%s' % (current_word, current_count) 

    输出结果:

  • 相关阅读:
    SP1716 GSS3
    A Simple Problem with Integers题解
    P4528 [CTSC2008]图腾 题解
    P1498 南蛮图腾 题解
    P2024 [NOI2001]食物链 题解
    Windows编程 Windows程序的生与死(中)
    Windows编程 Windows程序的生与死(上)
    C#实现在注册表中保存信息
    沿路径动画(Animation Along a Path)
    倾斜动画(SkewTransform)
  • 原文地址:https://www.cnblogs.com/kevin-yuan/p/4485143.html
Copyright © 2020-2023  润新知