• Spark(Python) 从内存中建立 RDD 的例子


    Spark(Python) 从内存中建立 RDD 的例子:

    myData = ["Alice","Carlos","Frank","Barbara"]
    myRdd = sc.parallelize(myData)
    myRdd.take(2)

    ----
    In [52]: myData = ["Alice","Carlos","Frank","Barbara"]

    In [53]: myRdd = sc.parallelize(myData)

    In [54]: myRdd.take(2)
    17/09/24 02:40:10 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:393
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Got job 5 (runJob at PythonRDD.scala:393) with 1 output partitions
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (runJob at PythonRDD.scala:393)
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Missing parents: List()
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43), which has no missing parents
    17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 3.2 KB, free 1767.1 KB)
    17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1769.3 KB)
    17/09/24 02:40:10 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:33950 (size: 2.2 KB, free: 208.7 MB)
    17/09/24 02:40:10 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
    17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43)
    17/09/24 02:40:10 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 1 tasks
    17/09/24 02:40:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, localhost, partition 0,PROCESS_LOCAL, 2028 bytes)
    17/09/24 02:40:10 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 5)
    17/09/24 02:40:11 INFO python.PythonRunner: Times: total = 41, boot = 20, init = 14, finish = 7
    17/09/24 02:40:11 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 5). 979 bytes result sent to driver
    17/09/24 02:40:11 INFO scheduler.DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:393) finished in 0.423 s
    17/09/24 02:40:11 INFO scheduler.DAGScheduler: Job 5 finished: runJob at PythonRDD.scala:393, took 0.648315 s
    17/09/24 02:40:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 5) in 423 ms on localhost (1/1)
    17/09/24 02:40:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
    Out[54]: ['Alice', 'Carlos']

    In [55]:

  • 相关阅读:
    获取Web.config的内容
    VS2013打开2008的项目
    Win7配置IIS7
    JavaScript通知浏览器,更改通知数目
    高分屏显示模糊修复工具
    Linux下使用 xrandr 命令设置屏幕分辨率
    虚拟机VMware怎么完全卸载干净,如何彻底卸载VMware虚拟机
    虚拟机安装VMware Tools
    网站测速、ping
    有名管道的非阻塞设置
  • 原文地址:https://www.cnblogs.com/gaojian/p/7587750.html
Copyright © 2020-2023  润新知