schedule方法的总源码:
/** * Schedule the currently available resources among waiting apps. This method will be called * every time a new app joins or resource availability changes. */ private def schedule(): Unit = {
//首先判断,master是alive状态,如不是,结束 if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors
// Random shuffle的原理,对传入的集合元素进行随机的打乱 val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0
//首先调度driver,只有用yarn-cluster模式提交的时候,才会注册driver。standalone和yarn-client模式,都会在本地直接启动driver,
//而不会来注册driver,更不可能让master调度driver
//driver进行注册的时候,会把信息放到等待队列中waitingDrivers for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var isClusterIdle = true var numWorkersVisited = 0
//driver没有被启动时运行,并且启动worker的数量进行限制 while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) isClusterIdle = worker.drivers.isEmpty && worker.executors.isEmpty numWorkersVisited += 1
//driver启动需要的条件,例如内存,cpu等资源 if (canLaunchDriver(worker, driver.desc)) { val allocated = worker.acquireResources(driver.desc.resourceReqs) driver.withResources(allocated)
//启动driver launchDriver(worker, driver)
//从等待队列中去除当前启动的driver waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } if (!launched && isClusterIdle) { logWarning(s"Driver ${driver.id} requires more resource than any of Workers could have.") } } startExecutorsOnWorkers() }
启动driver方法的源码:
private def launchDriver(worker: WorkerInfo, driver: DriverInfo): Unit = { logInfo("Launching driver " + driver.id + " on worker " + worker.id)
//将driver加入worker内存的缓存结构 worker.addDriver(driver)
//将worker内使用的内存和cpu数量,都加上driver需要的内存和cpu数量 driver.worker = Some(worker)
//调用worker的actor,给它发送launchDriver消息,让worker来启动driver worker.endpoint.send(LaunchDriver(driver.id, driver.desc, driver.resources))
//将driver的状态设置成running driver.state = DriverState.RUNNING }
启动workers源码:
/** * Schedule and launch executors on workers */ private def startExecutorsOnWorkers(): Unit = { // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app // in the queue, then the second app, etc.
//遍历waitingApps中的ApplicationInfo,并且过滤还有需要调度的core的application
for (app <- waitingApps) { val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1) // If the cores left is less than the coresPerExecutor,the cores left will not be allocated if (app.coresLeft >= coresPerExecutor) { // Filter out workers that don't have enough resources to launch an executor
//从workers中,过滤出状态为ALIVE的,再次过滤可以被application使用的worker,然后安装cpu的数量进行倒排 val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE) .filter(canLaunchExecutor(_, app.desc)) .sortBy(_.coresFree).reverse if (waitingApps.length == 1 && usableWorkers.isEmpty) { logWarning(s"App ${app.id} requires more resource than any of Workers could have.") }
//给每个worker分配多少个cores(cpu数量),数组 val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps) // Now that we've decided how many cores to allocate on each worker, let's allocate them
//分配worker和cpu数量。要启动的executor,平均分配到各个worker上去。 for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { allocateWorkerResourceToExecutors( app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos)) } } } }
扩展:中华石杉-spark从入门到精通,第48讲
要理解以上源码,需要知道其中的关系:
spark一个集群会有多个master节点和多个worker节点,master节点负责管理worker节点,worker节点管理Excetor。
一个worker节点包含多个Excetor,每个Excetor多个cpu core和一定memory。
扩展阅读:worker,Excetor,CPU core之间的关系。