• spark-master源码之schedule


    schedule方法的总源码:

     /**
       * Schedule the currently available resources among waiting apps. This method will be called
       * every time a new app joins or resource availability changes.
       */
      private def schedule(): Unit = {
    //首先判断,master是alive状态,如不是,结束
    if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors
    // Random shuffle的原理,对传入的集合元素进行随机的打乱
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0
    //首先调度driver,只有用yarn-cluster模式提交的时候,才会注册driver。standalone和yarn-client模式,都会在本地直接启动driver,
    //而不会来注册driver,更不可能让master调度driver
    //driver进行注册的时候,会把信息放到等待队列中waitingDrivers
    for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var isClusterIdle = true var numWorkersVisited = 0
    //driver没有被启动时运行,并且启动worker的数量进行限制
    while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) isClusterIdle = worker.drivers.isEmpty && worker.executors.isEmpty numWorkersVisited += 1
    //driver启动需要的条件,例如内存,cpu等资源
    if (canLaunchDriver(worker, driver.desc)) { val allocated = worker.acquireResources(driver.desc.resourceReqs) driver.withResources(allocated)
    //启动driver launchDriver(worker, driver)
    //从等待队列中去除当前启动的driver waitingDrivers
    -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } if (!launched && isClusterIdle) { logWarning(s"Driver ${driver.id} requires more resource than any of Workers could have.") } } startExecutorsOnWorkers() }

    启动driver方法的源码:

      private def launchDriver(worker: WorkerInfo, driver: DriverInfo): Unit = {
        logInfo("Launching driver " + driver.id + " on worker " + worker.id)
    //将driver加入worker内存的缓存结构 worker.addDriver(driver)
    //将worker内使用的内存和cpu数量,都加上driver需要的内存和cpu数量 driver.worker
    = Some(worker)
    //调用worker的actor,给它发送launchDriver消息,让worker来启动driver worker.endpoint.send(LaunchDriver(driver.id, driver.desc, driver.resources))
    //将driver的状态设置成running driver.state
    = DriverState.RUNNING }

    启动workers源码:

      /**
       * Schedule and launch executors on workers
       */
      private def startExecutorsOnWorkers(): Unit = {
        // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
        // in the queue, then the second app, etc.
    //遍历waitingApps中的ApplicationInfo,并且过滤还有需要调度的core的application
    for (app <- waitingApps) { val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1) // If the cores left is less than the coresPerExecutor,the cores left will not be allocated if (app.coresLeft >= coresPerExecutor) { // Filter out workers that don't have enough resources to launch an executor
    //从workers中,过滤出状态为ALIVE的,再次过滤可以被application使用的worker,然后安装cpu的数量进行倒排
    val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE) .filter(canLaunchExecutor(_, app.desc)) .sortBy(_.coresFree).reverse if (waitingApps.length == 1 && usableWorkers.isEmpty) { logWarning(s"App ${app.id} requires more resource than any of Workers could have.") }
    //给每个worker分配多少个cores(cpu数量),数组 val assignedCores
    = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps) // Now that we've decided how many cores to allocate on each worker, let's allocate them
    //分配worker和cpu数量。要启动的executor,平均分配到各个worker上去。
    for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { allocateWorkerResourceToExecutors( app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos)) } } } }

    扩展:中华石杉-spark从入门到精通,第48讲

    要理解以上源码,需要知道其中的关系:

    spark一个集群会有多个master节点和多个worker节点,master节点负责管理worker节点,worker节点管理Excetor。

    一个worker节点包含多个Excetor,每个Excetor多个cpu core和一定memory。

    扩展阅读:worker,Excetor,CPU core之间的关系。

  • 相关阅读:
    .NET 应用架构指导 V2 [1]
    MSSQL优化之————探索MSSQL执行计划
    删除代码中所有的空行
    微软企业库5.0学习笔记(一)企业库是什么?
    C# MP3操作类
    Microsoft Enterprise Library 5.0系列学习笔记【1】
    基于Asp.net的CMS系统We7架设实验(环境WIN7,SQL2005,.NET3.5)(初学者参考贴) 【转】
    C#中用ILMerge将所有引用的DLL和exe文件打成一个exe文件,有图解
    “Singleton”模式
    阅读技术类图书的思考
  • 原文地址:https://www.cnblogs.com/parent-absent-son/p/11743337.html
Copyright © 2020-2023  润新知