• spark streaming之三 rdd,job的动态生成以及动态调度


    前面一篇讲到了,DAG静态模板的生成。那么spark streaming会在每一个batch时间一到,就会根据DAG所形成的逻辑以及物理依赖链(dependencies)动态生成RDD以及由这些RDD组成的job,并形成一个job集合提交到集群当中执行。那么下面我们具体分析这三个步骤。

    首先从JobScheduler讲起。在本节所需要了解的是JobScheduler的两个重要对象。jobExecutor与JobHandler。jobExecutor是一个名为streaming-job-executor的线程池,JobHandler是一个继承自Runnable的线程类。提交过来的JOB将提交到到这里执行。

    private val jobExecutor =
        ThreadUtils.newDaemonFixedThreadPool(numConcurrentJobs, "streaming-job-executor")
     private class JobHandler(job: Job) extends Runnable with Logging {
        import JobScheduler._
    
        def run() {
          val oldProps = ssc.sparkContext.getLocalProperties
          try {
            ssc.sparkContext.setLocalProperties(SerializationUtils.clone(ssc.savedProperties.get()))
            val formattedTime = UIUtils.formatBatchTime(
              job.time.milliseconds, ssc.graph.batchDuration.milliseconds, showYYYYMMSS = false)
            val batchUrl = s"/streaming/batch/?id=${job.time.milliseconds}"
            val batchLinkText = s"[output operation ${job.outputOpId}, batch time ${formattedTime}]"
    
            ssc.sc.setJobDescription(
              s"""Streaming job from <a href="$batchUrl">$batchLinkText</a>""")
            ssc.sc.setLocalProperty(BATCH_TIME_PROPERTY_KEY, job.time.milliseconds.toString)
            ssc.sc.setLocalProperty(OUTPUT_OP_ID_PROPERTY_KEY, job.outputOpId.toString)
            // Checkpoint all RDDs marked for checkpointing to ensure their lineages are
            // truncated periodically. Otherwise, we may run into stack overflows (SPARK-6847).
            ssc.sparkContext.setLocalProperty(RDD.CHECKPOINT_ALL_MARKED_ANCESTORS, "true")
    
            // We need to assign `eventLoop` to a temp variable. Otherwise, because
            // `JobScheduler.stop(false)` may set `eventLoop` to null when this method is running, then
            // it's possible that when `post` is called, `eventLoop` happens to null.
            var _eventLoop = eventLoop
            if (_eventLoop != null) {
              _eventLoop.post(JobStarted(job, clock.getTimeMillis()))
              // Disable checks for existing output directories in jobs launched by the streaming
              // scheduler, since we may need to write output to an existing directory during checkpoint
              // recovery; see SPARK-4835 for more details.
              PairRDDFunctions.disableOutputSpecValidation.withValue(true) {
                job.run()
              }
              _eventLoop = eventLoop
              if (_eventLoop != null) {
                _eventLoop.post(JobCompleted(job, clock.getTimeMillis()))
              }
            } else {
              // JobScheduler has been stopped.
            }
          } finally {
            ssc.sparkContext.setLocalProperties(oldProps)
          }
        }
      }
    }
    View Code

    另外两个需要了解的对象,jobGenerator以及receiverTracker。jobGenerator负责job的动态生成,receiverTracker负责数据源的接收以及接收以后的transformation,以及根据这些转换形成DAG模板。随着JobScheduler启动的时候,jobGenerator以及receiverTracker也将启动。

     def start(): Unit = synchronized {
        if (eventLoop != null) return // scheduler has already been started
    
        logDebug("Starting JobScheduler")
        eventLoop = new EventLoop[JobSchedulerEvent]("JobScheduler") {
          override protected def onReceive(event: JobSchedulerEvent): Unit = processEvent(event)
    
          override protected def onError(e: Throwable): Unit = reportError("Error in job scheduler", e)
        }
        eventLoop.start()
    
        // attach rate controllers of input streams to receive batch completion updates
        for {
          inputDStream <- ssc.graph.getInputStreams
          rateController <- inputDStream.rateController
        } ssc.addStreamingListener(rateController)
    
        listenerBus.start()
        receiverTracker = new ReceiverTracker(ssc)
        inputInfoTracker = new InputInfoTracker(ssc)
    
        val executorAllocClient: ExecutorAllocationClient = ssc.sparkContext.schedulerBackend match {
          case b: ExecutorAllocationClient => b.asInstanceOf[ExecutorAllocationClient]
          case _ => null
        }
    
        executorAllocationManager = ExecutorAllocationManager.createIfEnabled(
          executorAllocClient,
          receiverTracker,
          ssc.conf,
          ssc.graph.batchDuration.milliseconds,
          clock)
        executorAllocationManager.foreach(ssc.addStreamingListener)
        receiverTracker.start()//
        jobGenerator.start()//
        executorAllocationManager.foreach(_.start())
        logInfo("Started JobScheduler")
      }
    View Code

    JobGenerator 启动

    本节先按下 receiverTracker不表。先说jobGenerator。我们首先来看看它的start函数。首先启动一个待命的线程。然后根据上次的spark streaming任务是否执行了checkpoint来决定是执行restart()还是startFirstTime()。

    /** Start generation of jobs */
      def start(): Unit = synchronized {
        ......
        eventLoop.start()
    
        if (ssc.isCheckpointPresent) {
          restart()
        } else {
          startFirstTime()
        }
      }

    因为只为弄清流程原理,我们只看第一次启动的情况。

     /** Starts the generator for the first time */
      private def startFirstTime() {
        val startTime = new Time(timer.getStartTime())
        graph.start(startTime - graph.batchDuration)
        timer.start(startTime.milliseconds)
        logInfo("Started JobGenerator at " + startTime)
      }

    首先graph.start(startTime - graph.batchDuration)传递一个时间参数给DStreamGraph,告知其batch启动时间,并初始化相关参数。然后启动定时器。

    RecurringTimer

    代码如下

    private val timer = new RecurringTimer(clock, ssc.graph.batchDuration.milliseconds,
        longTime => eventLoop.post(GenerateJobs(new Time(longTime))), "JobGenerator")

    其实就是往eventLoop线程里加入GenerateJobs。即定时根据DAG模板生成当前batch的DAG实例。注意这里是jobGenerator的GenerateJobs

    jobGenerator之GenerateJobs

    /** Generate jobs and perform checkpointing for the given `time`.  */
    private def generateJobs(time: Time) {
      SparkEnv.set(ssc.env)
      Try {
        jobScheduler.receiverTracker.allocateBlocksToBatch(time)                 
        graph.generateJobs(time)                                                
      } match {
        case Success(jobs) =>
          val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time) 
          jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos))    
        case Failure(e) =>
          jobScheduler.reportError("Error generating jobs for time " + time, e)
      }
      eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false))       
    }

    它做了五件事情。

    首先,通知jobScheduler获取当前batch需要处理的数据。

    然后调用DStreamGraph的GenerateJobs函数真正去执行操作。

    第三,将第一步获取到的数据保存在inputInfoTracker中。这部份被称之为元数据。什么叫元数据,就是最开始没有经过各种转换的数据。

    第四,将生成的RDD实例以及元数据一同提交给jobScheduler。这部份是提交到jobExecutor这个线程池里异步执行的。

    再然后后将一个checkpoint任务异步交给eventLoop去执行。

    InputinfoTracker

    可能有人会发现上面的表述中,第一和第三步有点雷同。但其实它们是不一样的。

    我们追踪jobScheduler.receiverTracker.allocateBlocksToBatch(time)最终得到的数据结构为:

    private[streaming] case class ReceivedBlockInfo(
    streamId: Int,
    numRecords: Option[Long],
    metadataOption: Option[Any],
    blockStoreResult: ReceivedBlockStoreResult
    )

    jobScheduler.inputInfoTracker.getInfo(time)最终得到的数据结构为:

    case class StreamInputInfo(
        inputStreamId: Int, numRecords: Long, metadata: Map[String, Any] = Map.empty) {
      require(numRecords >= 0, "numRecords must not be negative")
    
      def metadataDescription: Option[String] =
        metadata.get(StreamInputInfo.METADATA_KEY_DESCRIPTION).map(_.toString)
    }

    对比发现,两种数据结构都有streamId(inputStreamId),numRecords,metadataOption,区别在于blockStoreResult。而且第三步获取到的元数据有句注释:Map to track all the InputInfo related to specific batch time and input stream。意为跟踪所有输入流信息和处理记录号的跟踪器。

    // Map to track all the InputInfo related to specific batch time and input stream.
    private val batchTimeToInputInfos =
    new mutable.HashMap[Time, mutable.HashMap[Int, StreamInputInfo]]

    InputInfoTracker的注释是This class manages all the input streams as well as their input data statistics. The information will be exposed through StreamingListener for monitoring.还有,StreamInputInfo注解为@DeveloperAPI

    我的理解是第一步所做的是获取到数据。第三步所做的是给这些数据加上属性,提供给开发者查询展示监控。

    到目前为止,我们分析了动态调度的流程,整个流程如图:

    DAG实例生成

     上面仅仅是分析了动态调度的问题,而DAG实例究竟是怎样生成的?

    总结

    到这里为止,我们根据DAG模板拿到了DAG实例,以及数据。那么接下来,会根据DAGScheduler划分task,stage。

  • 相关阅读:
    基于Spring的集群会话共享方案-spring session
    Tensorflow 模型线上部署
    Dijkstra算法
    BFS和DFS
    图的基本概念
    排序5
    排序4
    排序3
    排序2
    排序1
  • 原文地址:https://www.cnblogs.com/eryuan/p/7146058.html
Copyright © 2020-2023  润新知