• spark shuffle 相关细节整理


    1.Shuffle Write 和Shuffle Read具体发生在哪里

    2.哪里用到了Partitioner

    3.何为mapSideCombine

    4.何时进行排序

    之前已经看过spark shuffle源码了,现在总结一下一些之前没有理解的小知识点,作为一个总结。

    用户自定义的Partitioner存到了哪里?

       假设用户在调用reduceByKey时,传递了一个自定义的Partitioner,那么,这个Partitioner会被保存到ShuffleRDD的ShuffleDependency中。在进行Shuffle Write时,会使用这个Partitioner来对finalRDD.iterator(partition)的计算结果shuffle到不同的Bucket中。

    何为mapSideCombine

      reduceByKey默认是开启了mapSideCombine的,在进行shuffle write时会进行本地聚合,在shuffle read时,也会合并一下。举一个例子更好:

    shuffle write阶段:

     partition0:[(hello,1),(hello,1)]

     partition1:[(hello,1),(word,1),(word,1)]

    mapSideCombine后:

     partition0:[(hello,2)]

     partition1:[(hello,1),(word,2)]

    hash shuffle后:

    [(hello,2),(hello,1)]

    [(word,2)]

    hash read阶段:

    [(hello,3)]

    [(word,2)]

    何时排序

    排序操作发生在shuffle read 阶段。在shuffle read 进行完mapSideCombine之后,就开始进行排序了。

    reduceByKey做了什么?

    假设我们对rdd1调用了reduceByKey,那么最终的RDD依赖关系如下:rdd1->ShuffleRDD。rdd1.reduceByKey中,会做如下非常重要的事情:创建ShuffleRDD,在创建ShuffleRDD的过程中最最最重要的就是会创建ShuffleDependency,这个ShuffleDependency中有Aggregator,Partitioner,Ordering,parentRDD,mapSideCombine等重要的信息。为什么说ShuffleDependency非常重要,因为他是沟通Shuffle Writer和Shuffle Reader的一个重要桥梁。

    Shuffle Write

    Shuffle Write 发生在ShuffleMapTask.runTask中。首先反序列出rdd1和那个ShuffleDependency:(rdd1,dep),然后调用rdd1.iterator(partition)获取计算结果,再对计算结果进行ShuffleWriter,代码如下:

    override def runTask(context: TaskContext): MapStatus = {
        // Deserialize the RDD using the broadcast variable.
        val deserializeStartTime = System.currentTimeMillis()
        val ser = SparkEnv.get.closureSerializer.newInstance()
        //统计反序列化rdd和shuffleDependency的时间
        val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
          ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
        _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
    
        metrics = Some(context.taskMetrics)
        var writer: ShuffleWriter[Any, Any] = null
        try {
          val manager = SparkEnv.get.shuffleManager
          writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
          writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
          return writer.stop(success = true).get
        } catch {
          case e: Exception =>
            try {
              if (writer != null) {
                writer.stop(success = false)
              }
            } catch {
              case e: Exception =>
                log.debug("Could not stop writer", e)
            }
            throw e
        }
      }

    我们以HashSuffleWriter为例,在其write(),他就会用到mapSideCombine和Partitioner。如下:

    /** Write a bunch of records to this task's output */
      override def write(records: Iterator[Product2[K, V]]): Unit = {
        val iter = if (dep.aggregator.isDefined) {
          if (dep.mapSideCombine) {
            dep.aggregator.get.combineValuesByKey(records, context)
          } else {
            records
          }
        } else {
          require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
          records
        }
    
        for (elem <- iter) {
          val bucketId = dep.partitioner.getPartition(elem._1)
          shuffle.writers(bucketId).write(elem._1, elem._2)
        }
      }

    Shuffle Read

      shuffle Read发生在ShuffleRDD的compute中:

      override def compute(split: Partition, context: TaskContext): Iterator[(K, C)] = {
        val dep = dependencies.head.asInstanceOf[ShuffleDependency[K, V, C]]
        SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index, split.index + 1, context)
          .read()
          .asInstanceOf[Iterator[(K, C)]]
      }

    下面是HashShuffleReader的read():

      /** Read the combined key-values for this reduce task */
      override def read(): Iterator[Product2[K, C]] = {
        val ser = Serializer.getSerializer(dep.serializer)
        val iter = BlockStoreShuffleFetcher.fetch(handle.shuffleId, startPartition, context, ser)
    
        val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
          if (dep.mapSideCombine) {
            new InterruptibleIterator(context, dep.aggregator.get.combineCombinersByKey(iter, context))
          } else {
            new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
          }
        } else {
          require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
    
          // Convert the Product2s to pairs since this is what downstream RDDs currently expect
          iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2))
        }
    
        // Sort the output if there is a sort ordering defined.
        dep.keyOrdering match {
          case Some(keyOrd: Ordering[K]) =>
            // Create an ExternalSorter to sort the data. Note that if spark.shuffle.spill is disabled,
            // the ExternalSorter won't spill to disk.
            val sorter = new ExternalSorter[K, C, C](ordering = Some(keyOrd), serializer = Some(ser))
            sorter.insertAll(aggregatedIter)
            context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
            context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
            sorter.iterator
          case None =>
            aggregatedIter
        }
      }
  • 相关阅读:
    Bit,Byte,Word,Dword,Qword
    One good turn deserves another
    IHttpModule & IHttpHandler
    畅想:哈夫曼树的应用
    The Controls collection cannot be modified because the control contains code blocks
    Talk O/RM (DAL) too ...
    实现对象集合枚举接口
    [ZT]实现创造生命的古老梦想——合成生物学的发展走向
    笔记本基础知识篇之DVI接口详解
    Analysis Services: write back
  • 原文地址:https://www.cnblogs.com/francisYoung/p/5418272.html
Copyright © 2020-2023  润新知