• [Spark][Python]Wordcount 例子


    [training@localhost ~]$ hdfs dfs -cat cats.txt

    The cat on the mat
    The aardvark sat on the sofa
    [training@localhost ~]$

    mydata001=sc.textFile('cats.txt')

    mydata002=mydata001.flatMap(lambda line: line.split(" "))

    In [12]: mydata002.take(1)
    Out[12]: [u'The']

    In [13]: mydata002.take(2)
    Out[13]: [u'The', u'cat']

    mydata003=mydata002.map(lambda word : (word,1))

    In [10]: mydata003.take(1)
    Out[10]: [(u'The', 1)]

    In [11]: mydata003.take(2)
    Out[11]: [(u'The', 1), (u'cat', 1)]


    mydata004 = mydata003.reduceByKey(lambda x,y : x+y)

    In [15]: mydata004.take(1)
    Out[15]: [(u'on', 2)]

    In [16]: mydata004.take(2)
    Out[16]: [(u'on', 2), (u'mat', 1)]

    In [17]: mydata004.take(3)
    Out[17]: [(u'on', 2), (u'mat', 1), (u'sofa', 1)]

  • 相关阅读:
    Codeforces-754D Fedor and coupons
    LightOJ
    LightOJ
    LightOJ
    LightOJ
    POJ
    HDU
    HDU
    HDU-2159
    方法的重写
  • 原文地址:https://www.cnblogs.com/gaojian/p/7608625.html
Copyright © 2020-2023  润新知