Spark Job log 文件分析:
下面是一个Spark 数据据统计Job 的 log 文件, 从前到后的顺序分析Job的执行过程(Spark local mode)。
启动 SparkContext
提交 Job RatingHistogram
20/05/04 18:02:20 INFO SparkContext: Running Spark version 2.4.4 20/05/04 18:02:20 INFO SparkContext: Submitted application: RatingsHistogram
启动SparkDriver 服务
20/05/04 18:02:20 INFO SecurityManager: Changing view acls to: xxx 20/05/04 18:02:20 INFO SecurityManager: Changing modify acls to: xxx 20/05/04 18:02:20 INFO SecurityManager: Changing view acls groups to: 20/05/04 18:02:20 INFO SecurityManager: Changing modify acls groups to: 20/05/04 18:02:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set() 20/05/04 18:02:21 INFO Utils: Successfully started service 'sparkDriver' on port 4911.
登记 Block 数据存储管理组件 和 开启 MemoryStore 内存空间
Spark 云计算基于内存处理数据,必定有其内容管理及使用组件
访问当前 MemoryStore 的“习惯做法”是向管理内存存储的块管理器请求 SparkEnv:SparkEnv.get.blockManager.memoryStore
DefaultTopologyMapper:
- 一个假设所有节点都在同一机架上的拓扑映射器(TopologyMapper)
- 在选择executor进行块复制时,可以使用此信息来区分executor与候选executor来自相同还是不同的机架。
20/05/04 18:02:22 INFO SparkEnv: Registering MapOutputTracker 20/05/04 18:02:22 INFO SparkEnv: Registering BlockManagerMaster 20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 20/05/04 18:02:22 INFO DiskBlockManager: Created local directory at C:UsersxxxAppDataLocalTemplockmgr-168e9bad-f66f-4834-a7c7-6d4dadcc00c4 20/05/04 18:02:22 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 20/05/04 18:02:22 INFO SparkEnv: Registering OutputCommitCoordinator
开启 UI 监管服务 web service
20/05/04 18:02:22 INFO Utils: Successfully started service 'SparkUI' on port 4040. 20/05/04 18:02:22 INFO SparkUI: Bound SparkUI to 0.0.0.0, andINFO Executor: Starting executor ID driver on host localhost started at http://localhost:4040
启动 Block 传输服务
用于网络间传输数据块(和Shuffle 有关了)
使用Netty每次获取一组块(Netty是一个异步事件驱动的网络应用程序框架,用于快速开发可维护的高性能协议服务器和客户端- https://netty.io/)。
20/05/04 18:02:22 20/05/04 18:02:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4920. 20/05/04 18:02:22 INFO NettyBlockTransferService: Server created on localhost:4920
继续登记BlockManager 和 相应的内存空间
RandomBlockReplicationPolicy:把数据块放到不同的机器上
内存空间:366.3 MB RAM
20/05/04 18:02:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 20/05/04 18:02:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 4920, None) 20/05/04 18:02:22 INFO BlockManagerMasterEndpoint: Registering block manager localhost:4920 with 366.3 MB RAM, BlockManagerId(driver, localhost, 4920, None) 20/05/04 18:02:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 4920, None) 20/05/04 18:02:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 4920, None) 20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 366.1 MB) 20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB) 20/05/04 18:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:4920 (size: 22.9 KB, free: 366.3 MB)
开始识别输入文件
但是并没有split 处理
20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 366.1 MB) 20/05/04 18:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB) 20/05/04 18:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:4920 (size: 22.9 KB, free: 366.3 MB) 20/05/04 18:02:23 INFO SparkContext: Created broadcast 0 from textFile at <unknown>:0 20/05/04 18:02:23 INFO FileInputFormat: Total input paths to process : 1
Action trigger DAG Scheduler 安排任务
20/05/04 18:02:24 INFO SparkContext: Starting job: countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17 20/05/04 18:02:24 INFO DAGScheduler: Got job 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17) with 1 output partitions 20/05/04 18:02:24 INFO DAGScheduler: Final stage: ResultStage 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17) 20/05/04 18:02:24 INFO DAGScheduler: Parents of final stage: List() 20/05/04 18:02:24 INFO DAGScheduler: Missing parents: List() 20/05/04 18:02:24 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17), which has no missing parents
Task Scheduler 执行 task
20/05/04 18:02:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 20/05/04 18:02:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7917 bytes) 20/05/04 18:02:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
Task 执行后返回 Driver 结果
20/05/04 18:02:24 INFO HadoopRDD: Input split: file:/Learning/lessons/spark_code_resource/ml-100k/u.data:0+1979173 20/05/04 18:02:25 INFO PythonRunner: Times: total = 906, boot = 515, init = 32, finish = 359 20/05/04 18:02:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1651 bytes result sent to driver 20/05/04 18:02:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1094 ms on localhost (executor driver) (1/1) 20/05/04 18:02:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
Job 结束
Task 结束后 返回 DAG Scheduler
输出结果
20/05/04 18:02:25 INFO PythonAccumulatorV2: Connected to AccumulatorServer at host: 127.0.0.1 port: 4922 20/05/04 18:02:25 INFO DAGScheduler: ResultStage 0 (countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17) finished in 1.204 s 20/05/04 18:02:25 INFO DAGScheduler: Job 0 finished: countByValue at C:/Learning/lessons/spark_code_resource/Ratings-counter.py:17, took 1.308090 s score 1. movie num: 6110 score 2. movie num: 11370 score 3. movie num: 27145 score 4. movie num: 34174 score 5. movie num: 21201
关闭服务释放内存
20/05/04 18:02:25 INFO SparkContext: Invoking stop() from shutdown hook 20/05/04 18:02:25 INFO SparkUI: Stopped Spark web UI at http://localhost:4040 20/05/04 18:02:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/05/04 18:02:25 INFO MemoryStore: MemoryStore cleared 20/05/04 18:02:25 INFO BlockManager: BlockManager stopped 20/05/04 18:02:25 INFO BlockManagerMaster: BlockManagerMaster stopped 20/05/04 18:02:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/05/04 18:02:25 INFO SparkContext: Successfully stopped SparkContext 20/05/04 18:02:25 INFO ShutdownHookManager: Shutdown hook called 20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxAppDataLocalTempspark-1345353d-b57d-4bd3-905b-79d0b4bf8f8epyspark-ad7b222c-410a-40f3-a2f6-f1153b55041f 20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxAppDataLocalTempspark-1345353d-b57d-4bd3-905b-79d0b4bf8f8e 20/05/04 18:02:25 INFO ShutdownHookManager: Deleting directory C:UsersxxxppDataLocalTempspark-999989b5-d5d2-4929-82a6-474b464e8b4e