• 从Spark 的 log 文件看 job 执行过程


    Spark Job log 文件分析:

    下面是一个Spark 数据据统计Job 的 log 文件, 从前到后的顺序分析Job的执行过程(Spark local mode)。

    启动 SparkContext 

    提交 Job RatingHistogram

    20/05/04 18:02:20 INFO SparkContext: Running Spark version 2.4.4
    20/05/04 18:02:20 INFO SparkContext: Submitted application: RatingsHistogram

    启动SparkDriver 服务 

    20/05/04 18:02:20 INFO SecurityManager: Changing view acls to: xxx
    20/05/04 18:02:20 INFO SecurityManager: Changing modify acls to: xxx
    20/05/04 18:02:20 INFO SecurityManager: Changing view acls groups to:
    20/05/04 18:02:20 INFO SecurityManager: Changing modify acls groups to:
    20/05/04 18:02:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(xxx); groups with view permissions: Set(); users  with modify permissions: Set(xxx); groups with modify permissions: Set()
    20/05/04 18:02:21 INFO Utils: Successfully started service 'sparkDriver' on port 4911.

    登记 Block 数据存储管理组件 和 开启 MemoryStore 内存空间

    Spark 云计算基于内存处理数据,必定有其内容管理及使用组件

    访问当前 MemoryStore 的“习惯做法”是向管理内存存储的块管理器请求 SparkEnv:SparkEnv.get.blockManager.memoryStore

    DefaultTopologyMapper:

    • 一个假设所有节点都在同一机架上的拓扑映射器(TopologyMapper
    • 在选择executor进行块复制时,可以使用此信息来区分executor与候选executor来自相同还是不同的机架。
    20/05/04 18:02:22 INFO SparkEnv: Registering MapOutputTracker
    20/05/04 18:02:22 INFO SparkEnv: Registering BlockManagerMaster
    20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    20/05/04 18:02:22 INFO DiskBlockManager: Created local directory at C:UsersxxxAppDataLocalTemplockmgr-168e9bad-f66f-4834-a7c7-6d4dadcc00c4
    20/05/04 18:02:22 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
    20/05/04 18:02:22 INFO SparkEnv: Registering OutputCommitCoordinator

    开启 UI 监管服务 web service

    20/05/04 18:02:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    20/05/04 18:02:22 INFO SparkUI: Bound SparkUI to 0.0.0.0, andINFO Executor: Starting executor ID driver on host localhost
     started at http://localhost:4040

    启动 Block 传输服务

    用于网络间传输数据块(和Shuffle 有关了

    使用Netty每次获取一组块(Netty是一个异步事件驱动的网络应用程序框架,用于快速开发可维护的高性能协议服务器和客户端- https://netty.io/)。

    20/05/04 18:02:22 20/05/04 18:02:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4920.
    20/05/04 18:02:22 INFO NettyBlockTransferService: Server created on localhost:4920

    继续登记BlockManager 和 相应的内存空间

    RandomBlockReplicationPolicy:把数据块放到不同的机器上
    内存空间:366.3 MB RAM
     
    20/05/04 18:02:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    20/05/04 18:02:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 4920, None)
    20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: Registering block manager localhost:4920 with 366.3 MB RAM, BlockManagerId(driver, localhost, 4920, None)
    20/05/04 18:02:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 4920, None)
    20/05/04 18:02:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 4920, None)
    20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 366.1 MB)
    20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
    20/05/04 18:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:4920 (size: 22.9 KB, free: 366.3 MB)

    开始识别输入文件

    但是并没有split 处理

    20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 366.1 MB)
    20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
    20/05/04 18:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:4920 (size: 22.9 KB, free: 366.3 MB)
    20/05/04 18:02:23 INFO SparkContext: Created broadcast 0 from textFile at <unknown>:0
    20/05/04 18:02:23 INFO FileInputFormat: Total input paths to process : 1

    Action trigger DAG Scheduler 安排任务

    20/05/04 18:02:24 INFO SparkContext: Starting job: countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17
    20/05/04 18:02:24 INFO DAGScheduler: Got job 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17) with 1 output partitions
    20/05/04 18:02:24 INFO DAGScheduler: Final stage: ResultStage 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17)
    20/05/04 18:02:24 INFO DAGScheduler: Parents of final stage: List()
    20/05/04 18:02:24 INFO DAGScheduler: Missing parents: List()
    20/05/04 18:02:24 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17), which has no missing parents

    Task Scheduler 执行 task

    20/05/04 18:02:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
    20/05/04 18:02:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7917 bytes)
    20/05/04 18:02:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

    Task 执行后返回 Driver 结果

    20/05/04 18:02:24 INFO HadoopRDD: Input split: file:/Learning/lessons/spark_code_resource/ml-100k/u.data:0+1979173
    20/05/04 18:02:25 INFO PythonRunner: Times: total = 906, boot = 515, init = 32, finish = 359
    20/05/04 18:02:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1651 bytes result sent to driver
    20/05/04 18:02:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1094 ms on localhost (executor driver) (1/1)
    20/05/04 18:02:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool

    Job 结束

    Task 结束后 返回 DAG Scheduler

    输出结果

    20/05/04 18:02:25 INFO PythonAccumulatorV2: Connected to AccumulatorServer at host: 127.0.0.1 port: 4922
    20/05/04 18:02:25 INFO DAGScheduler: ResultStage 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17) finished in 1.204 s
    20/05/04 18:02:25 INFO DAGScheduler: Job 0 finished: countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17, took 1.308090 s
     score 1. movie num: 6110
     score 2. movie num: 11370
     score 3. movie num: 27145
     score 4. movie num: 34174
     score 5. movie num: 21201

    关闭服务释放内存

    20/05/04 18:02:25 INFO SparkContext: Invoking stop() from shutdown hook
    20/05/04 18:02:25 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
    20/05/04 18:02:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    20/05/04 18:02:25 INFO MemoryStore: MemoryStore cleared
    20/05/04 18:02:25 INFO BlockManager: BlockManager stopped
    20/05/04 18:02:25 INFO BlockManagerMaster: BlockManagerMaster stopped
    20/05/04 18:02:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    20/05/04 18:02:25 INFO SparkContext: Successfully stopped SparkContext
    20/05/04 18:02:25 INFO ShutdownHookManager: Shutdown hook called
    20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxAppDataLocalTempspark-1345353d-b57d-4bd3-905b-79d0b4bf8f8epyspark-ad7b222c-410a-40f3-a2f6-f1153b55041f
    20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxAppDataLocalTempspark-1345353d-b57d-4bd3-905b-79d0b4bf8f8e
    20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxppDataLocalTempspark-999989b5-d5d2-4929-82a6-474b464e8b4e
  • 相关阅读:
    AJAX请求 $.post方法的使用
    通过jQuery Ajax使用FormData对象上传文件
    iframe跨域访问
    js编码解码
    ajax头像上传
    无边框窗体和用户控件以及权限
    只开启一个窗体和进程以及多线程的操作
    通过一个窗体操作另一个窗体
    流和打印控件用法
    listview和简单的记事本操作
  • 原文地址:https://www.cnblogs.com/yjyyjy/p/12829099.html
Copyright © 2020-2023  润新知