• CS100.1x-lab0_student


    这是CS100.1x第一个提交的作业,是给我们测试用的。相关ipynb文件见我github。本来没什么好说的。我在这里简单讲一下,后面会更详细的讲解。主要分成5个部分。

    Part 1: Test Spark functionality

    Parallelize, filter, and reduce

    # Check that Spark is working
    largeRange = sc.parallelize(xrange(100000))
    reduceTest = largeRange.reduce(lambda a, b: a + b)
    filterReduceTest = largeRange.filter(lambda x: x % 7 == 0).sum()
    
    print reduceTest
    print filterReduceTest
    
    # If the Spark jobs don't work properly these will raise an AssertionError
    assert reduceTest == 4999950000
    assert filterReduceTest == 714264285
    

    前三行代码的作用分别是,把一个python的集合转化为RDD,把列表里的值相加,把列表里对7整除的数相加

    Loading a text file

    # Check loading data with sc.textFile
    import os.path
    baseDir = os.path.join('data')
    inputPath = os.path.join('cs100', 'lab1', 'shakespeare.txt')
    fileName = os.path.join(baseDir, inputPath)
    
    rawData = sc.textFile(fileName)
    shakespeareCount = rawData.count()
    
    print shakespeareCount
    
    # If the text file didn't load properly an AssertionError will be raised
    assert shakespeareCount == 122395
    

    这段代码第一段是构造文件路径,第二段是读取文本文件,然后统计行数。

    Part 2: Check class testing library

    Compare with hash

    # TEST Compare with hash (2a)
    # Check our testing library/package
    # This should print '1 test passed.' on two lines
    from test_helper import Test
    
    twelve = 12
    Test.assertEquals(twelve, 12, 'twelve should equal 12')
    Test.assertEqualsHashed(twelve, '7b52009b64fd0a2a49e6d8a939753077792b0554',
                            'twelve, once hashed, should equal the hashed value of 12')
    

    测试哈希比较,没什么好说的

    Compare lists

    # TEST Compare lists (2b)
    # This should print '1 test passed.'
    unsortedList = [(5, 'b'), (5, 'a'), (4, 'c'), (3, 'a')]
    Test.assertEquals(sorted(unsortedList), [(3, 'a'), (4, 'c'), (5, 'a'), (5, 'b')],
                      'unsortedList does not sort properly')
    

    排序的操作

    Part 3: Check plotting

    Our first plot

    # Check matplotlib plotting
    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    from math import log
    
    # function for generating plot layout
    def preparePlot(xticks, yticks, figsize=(10.5, 6), hideLabels=False, gridColor='#999999', gridWidth=1.0):
        plt.close()
        fig, ax = plt.subplots(figsize=figsize, facecolor='white', edgecolor='white')
        ax.axes.tick_params(labelcolor='#999999', labelsize='10')
        for axis, ticks in [(ax.get_xaxis(), xticks), (ax.get_yaxis(), yticks)]:
            axis.set_ticks_position('none')
            axis.set_ticks(ticks)
            axis.label.set_color('#999999')
            if hideLabels: axis.set_ticklabels([])
        plt.grid(color=gridColor, linewidth=gridWidth, linestyle='-')
        map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])
        return fig, ax
    
    # generate layout and plot data
    x = range(1, 50)
    y = [log(x1 ** 2) for x1 in x]
    fig, ax = preparePlot(range(5, 60, 10), range(0, 12, 1))
    plt.scatter(x, y, s=14**2, c='#d6ebf2', edgecolors='#8cbfd0', alpha=0.75)
    ax.set_xlabel(r'$range(1, 50)$'), ax.set_ylabel(r'$log_e(x^2)$')
    pass
    

    熟悉matplotlib的人应该知道,这个就是自己生成数据,然后画出来。
    运行完代码后,得到如下图片。

  • 相关阅读:
    在SharePoint中实现Workflow(2):创建一个Workflow
    pku1384PiggyBank(动态规划)
    pku1088滑雪(记忆性搜索)
    hdu1251统计难题(初次接触字典树)
    详细解说 STL 排序(Sort)
    pku1631Bridging signals(动态规划题+二分搜索)
    pku1157LITTLE SHOP OF FLOWERS(简单动态规划题:摆放鲜花使审美价值达到最高)
    pku1067取石子游戏(博弈)
    pku2524Ubiquitous Religions(初次接触并查集)
    pku1050To the Max(求矩阵的最大子段和)
  • 原文地址:https://www.cnblogs.com/-Sai-/p/6659916.html
Copyright © 2020-2023  润新知