转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/7767621.html
本文所研究的spark-streaming代码版本为2.3.0-SNAPSHOT
spark-streaming为了匹配0.10以后版本的kafka客户端变化推出了一个目前还是Experimental状态的spark-streaming-kafka-0-10客户端,由于老的0.8版本无法支持kerberos权限校验,需要研究下spark-streaming-kafka-0-10的源码实现以及系统架构。
首先看下初始化kafkastream的方法声明,
def createDirectStream[K, V](
ssc: StreamingContext,
locationStrategy: LocationStrategy,
consumerStrategy: ConsumerStrategy[K, V],
perPartitionConfig: PerPartitionConfig
): InputDStream[ConsumerRecord[K, V]] = {
new DirectKafkaInputDStream[K, V](ssc, locationStrategy, consumerStrategy, perPartitionConfig)
}
DirectKafkaInputDStream的初始化参数包括StreamingContext,LocationStrategy,ConsumerStrategy和perPartitionConfig,根据源码文档locationStrategy一般采用PreferConsistent
,perPartitionConfig一般采用默认实现,这里不做研究,主要会有点区别的参数为consumerStrategy,它的作用会在下面的源码分析里展示出来。
一 driver consumer
JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils .createDirectStream(jssc, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String> Subscribe(topics, kafkaParams));
以上述初始化代码为例,首先DirectKafkaInputDStream会调用start方法进行初始化,相关代码如下
override def start(): Unit = { val c = consumer //初始化driver端consumer paranoidPoll(c) //调整offset位置 if (currentOffsets.isEmpty) { currentOffsets = c.assignment().asScala.map { tp => tp -> c.position(tp) }.toMap } // don't actually want to consume any messages, so pause all partitions c.pause(currentOffsets.keySet.asJava) }
这段代码在driver端初始化一个consumer, 该consumer的类型由上面提到的consumerStrategy决定,Subscribe类的实现如下,相当与在driver端启动一个以subscribe模式订阅topic的客户端。在有初始启动offset传入的情况下会把consumer的offset游标seek到对应的地址。
private case class Subscribe[K, V]( topics: ju.Collection[jl.String], kafkaParams: ju.Map[String, Object], offsets: ju.Map[TopicPartition, jl.Long] ) extends ConsumerStrategy[K, V] with Logging { def executorKafkaParams: ju.Map[String, Object] = kafkaParams def onStart(currentOffsets: ju.Map[TopicPartition, jl.Long]): Consumer[K, V] = { val consumer = new KafkaConsumer[K, V](kafkaParams) consumer.subscribe(topics) val toSeek = if (currentOffsets.isEmpty) { offsets } else { currentOffsets } if (!toSeek.isEmpty) { // work around KAFKA-3370 when reset is none // poll will throw if no position, i.e. auto offset reset none and no explicit position // but cant seek to a position before poll, because poll is what gets subscription partitions // So, poll, suppress the first exception, then seek val aor = kafkaParams.get(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG) val shouldSuppress = aor != null && aor.asInstanceOf[String].toUpperCase(Locale.ROOT) == "NONE" try { consumer.poll(0) } catch { case x: NoOffsetForPartitionException if shouldSuppress => logWarning("Catching NoOffsetForPartitionException since " + ConsumerConfig.AUTO_OFFSET_RESET_CONFIG + " is none. See KAFKA-3370") } toSeek.asScala.foreach { case (topicPartition, offset) => consumer.seek(topicPartition, offset) } // we've called poll, we must pause or next poll may consume messages and set position consumer.pause(consumer.assignment()) } consumer } }
DirectKafkaInputDStream的另一个核心方法是compute,这个方法的核心作用之一就是不断地生成对应时间的RDD分配到新的job计算任务,具体实现如下,主要是根据系统设置的限速和现有
kafka topicpartion计算出每一个job分配到的KafkaRDD对应的数据范围以及提交offset等工作。
override def compute(validTime: Time): Option[KafkaRDD[K, V]] = { val untilOffsets = clamp(latestOffsets()) //根据maxrate和backpressuce等限速配置计算下一批rdd每个里面kafka消息的截止offset val offsetRanges = untilOffsets.map { case (tp, uo) => val fo = currentOffsets(tp) OffsetRange(tp.topic, tp.partition, fo, uo) }//初始化offset列表,包括(topic,partition,起始offset,截止offset) val useConsumerCache = context.conf.getBoolean("spark.streaming.kafka.consumer.cache.enabled", true) val rdd = new KafkaRDD[K, V](context.sparkContext, executorKafkaParams, offsetRanges.toArray, getPreferredHosts, useConsumerCache)//根据计算好的offsetRange和修改后的kafkaParam初始化RDD // Report the record number and metadata of this batch interval to InputInfoTracker. val description = offsetRanges.filter { offsetRange => // Don't display empty ranges. offsetRange.fromOffset != offsetRange.untilOffset }.map { offsetRange => s"topic: ${offsetRange.topic} partition: ${offsetRange.partition} " + s"offsets: ${offsetRange.fromOffset} to ${offsetRange.untilOffset}" }.mkString(" ") // Copy offsetRanges to immutable.List to prevent from being modified by the user val metadata = Map( "offsets" -> offsetRanges.toList, StreamInputInfo.METADATA_KEY_DESCRIPTION -> description) val inputInfo = StreamInputInfo(id, rdd.count, metadata) ssc.scheduler.inputInfoTracker.reportInfo(validTime, inputInfo) currentOffsets = untilOffsets commitAll() Some(rdd) }
注意上文里的latestOffset()方法实现如下,通过新的consumerapi的c.seekToEnd(currentOffsets.keySet.asJava)将consumer的offsetapi游标放到了对应分区的最后位置,
如果在初始化的kafkaParams设置"enable.auto.commit"属性为"true",diver客户端会自动像kafka发送最后seek到的offset位置
protected def latestOffsets(): Map[TopicPartition, Long] = { val c = consumer paranoidPoll(c) val parts = c.assignment().asScala // make sure new partitions are reflected in currentOffsets val newPartitions = parts.diff(currentOffsets.keySet) // position for new partitions determined by auto.offset.reset if no commit currentOffsets = currentOffsets ++ newPartitions.map(tp => tp -> c.position(tp)).toMap // don't want to consume messages, so pause c.pause(newPartitions.asJava) // find latest available offsets c.seekToEnd(currentOffsets.keySet.asJava) parts.map(tp => tp -> c.position(tp)).toMap }
二 executor consumer
executor consumer的初始化过程位于KafkaRDD内部,在程序初始的kafaparams基础上调用了fixKfkaParams方法对参数进行了部分调整和改写,包括groupid,enable.auto.commit,auto.offset.config等属性。
private[kafka010] def fixKafkaParams(kafkaParams: ju.HashMap[String, Object]): Unit = { logWarning(s"overriding ${ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG} to false for executor") kafkaParams.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false: java.lang.Boolean) logWarning(s"overriding ${ConsumerConfig.AUTO_OFFSET_RESET_CONFIG} to none for executor") kafkaParams.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "none") // driver and executor should be in different consumer groups val originalGroupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG) if (null == originalGroupId) { logError(s"${ConsumerConfig.GROUP_ID_CONFIG} is null, you should probably set it") } val groupId = "spark-executor-" + originalGroupId logWarning(s"overriding executor ${ConsumerConfig.GROUP_ID_CONFIG} to ${groupId}") kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, groupId) // possible workaround for KAFKA-3135 val rbb = kafkaParams.get(ConsumerConfig.RECEIVE_BUFFER_CONFIG) if (null == rbb || rbb.asInstanceOf[java.lang.Integer] < 65536) { logWarning(s"overriding ${ConsumerConfig.RECEIVE_BUFFER_CONFIG} to 65536 see KAFKA-3135") kafkaParams.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 65536: java.lang.Integer) } }
KafkaRDD内部与consumer相关的几个方法如下:首先通过getPartitions方法将对应的topic分区与RDD的每一个分区对应起来,然后通过compute方法初始化KafkaRDDIterator,每个KafkaRDDIterator通过CachedKafkaConsumer接口拿到一个CachedKafkaConsumer引用并在next()方法里不断返回ConsumerRecord值。
override def getPartitions: Array[Partition] = { offsetRanges.zipWithIndex.map { case (o, i) => new KafkaRDDPartition(i, o.topic, o.partition, o.fromOffset, o.untilOffset) }.toArray } override def compute(thePart: Partition, context: TaskContext): Iterator[ConsumerRecord[K, V]] = { val part = thePart.asInstanceOf[KafkaRDDPartition] assert(part.fromOffset <= part.untilOffset, errBeginAfterEnd(part)) if (part.fromOffset == part.untilOffset) { logInfo(s"Beginning offset ${part.fromOffset} is the same as ending offset " + s"skipping ${part.topic} ${part.partition}") Iterator.empty } else { new KafkaRDDIterator(part, context) } } private class KafkaRDDIterator( part: KafkaRDDPartition, context: TaskContext) extends Iterator[ConsumerRecord[K, V]] { logInfo(s"Computing topic ${part.topic}, partition ${part.partition} " + s"offsets ${part.fromOffset} -> ${part.untilOffset}") val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] context.addTaskCompletionListener{ context => closeIfNeeded() } val consumer = if (useConsumerCache) { CachedKafkaConsumer.init(cacheInitialCapacity, cacheMaxCapacity, cacheLoadFactor) if (context.attemptNumber >= 1) { // just in case the prior attempt failures were cache related CachedKafkaConsumer.remove(groupId, part.topic, part.partition) } CachedKafkaConsumer.get[K, V](groupId, part.topic, part.partition, kafkaParams) } else { CachedKafkaConsumer.getUncached[K, V](groupId, part.topic, part.partition, kafkaParams) } var requestOffset = part.fromOffset def closeIfNeeded(): Unit = { if (!useConsumerCache && consumer != null) { consumer.close } } override def hasNext(): Boolean = requestOffset < part.untilOffset override def next(): ConsumerRecord[K, V] = { assert(hasNext(), "Can't call getNext() once untilOffset has been reached") val r = consumer.get(requestOffset, pollTimeout) requestOffset += 1 r } }
根据是否使用consumer的缓存池特性(这个属性由spark.streaming.kafka.consumer.cache.enabled决定),CachedKafkaConsumer提供了两种静态方法获取consumer客户端,get()和getUncached()。
get方法从CachedKafkaConsumer的静态linkhashmap属性cache中存取已经初始化好的CachedKafkaConsumer对象,相当于每个executor内部维护了一个consumer的连接池。
getUncached相当于每次拉新数据都初始化一个consumer连接,并在这个RDD任务结束后关掉consumer实例。
CachedKafkaConsumer初始化kafka consumer客户端的相关代码如下,可以看到真正拉数据的executor客户端是采用了assgin方式订阅到单个分区初始化完成的。
protected val consumer = { val c = new KafkaConsumer[K, V](kafkaParams) val tps = new ju.ArrayList[TopicPartition]() tps.add(topicPartition) c.assign(tps) c }
三 offset提交
除了上文提到的将driver端的auto.commit属性打开提交offset的方式以外,sparkstreaming还在DirectKafkaInputDStream中提供了一个commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback)方法允许手动触发offset提交,这个方法将需要提交的offset列表放到了一个commitQueue里面,然后在每次调用compute方法的时候最后的commitall方法通过driver端的consumer把offset提交到kafka上。
def commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback): Unit = { commitCallback.set(callback) commitQueue.addAll(ju.Arrays.asList(offsetRanges: _*)) } protected def commitAll(): Unit = { val m = new ju.HashMap[TopicPartition, OffsetAndMetadata]() var osr = commitQueue.poll() while (null != osr) { val tp = osr.topicPartition val x = m.get(tp) val offset = if (null == x) { osr.untilOffset } else { Math.max(x.offset, osr.untilOffset) } m.put(tp, new OffsetAndMetadata(offset)) osr = commitQueue.poll() } if (!m.isEmpty) { consumer.commitAsync(m, commitCallback.get) } }
stream.foreachRDD(rdd -> { OffsetRange[] offsetRanges = ((HasOffsetRanges) rdd.rdd()).offsetRanges(); // some time later, after outputs have completed ((CanCommitOffsets) stream.inputDStream()).commitAsync(offsetRanges); });
注意:如果是采用官方文档里上述方式手动提交offset,需要把stream对象的属性标记为static或者transient避免序列化,不然可能在任务提交的时候报DirectKafkaInputDStream 无法序列化导致Task not serializable错误
结论
新的spark-streaming-kafka-0-10客户端采用了与原有版本完全不同的架构,一个job里面运行了两组consumer:driver consumer和 executor consumer,driver端consumer负责分配和提交offset到初始化好的KafkaRDD当中去,KafkaRDD内部会根据分配到的每个topic的每个partition初始化一个CachedKafkaConsumer客户端通过assgin的方式订阅到topic拉取数据。