• Kafka producer异步发送在某些情况会阻塞主线程,使用时候慎重


    最近发现一个Kafka producer异步发送在某些情况会阻塞主线程,后来在排查解决问题过程中发现这可以算是Kafka的一个说明不恰当的地方。

    问题说明

    在很多场景下我们会使用异步方式来发送Kafka的消息,会使用KafkaProducer中的以下方法:

    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {}

    根据文档的说明它是一个异步的发送方法,按道理不管如何它都不应该阻塞主线程,但实际中某些情况下会出现阻塞线程,比如broker未正确运行,topic未创建等情况,有些时候我们不需要对发送的结果做保证,但是如果出现阻塞的话,会影响其他业务逻辑。

    问题出现点

    从KafkaProducer send这个方法声明上看并没有什么问题,那么我们来看一下她的具体实现:

    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
        return doSend(interceptedRecord, callback);
    }
    
    /**
      * Implementation of asynchronously send a record to a topic.
      */
    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        TopicPartition tp = null;
        try {
            throwIfProducerClosed();
            // first make sure the metadata for the topic is available
            ClusterAndWaitTime clusterAndWaitTime;
            try {
                clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);  //出现问题的地方
            } catch (KafkaException e) {
                if (metadata.isClosed())
                    throw new KafkaException("Producer closed while send in progress", e);
                throw e;
            }
            ...
        } catch (ApiException e) {
            ...
        }
    }
    
    private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long maxWaitMs) throws InterruptedException {
        // add topic to metadata topic list if it is not there already and reset expiry
        Cluster cluster = metadata.fetch();
    
        if (cluster.invalidTopics().contains(topic))
            throw new InvalidTopicException(topic);
    
        metadata.add(topic);
    
        Integer partitionsCount = cluster.partitionCountForTopic(topic);
        // Return cached metadata if we have it, and if the record's partition is either undefined
        // or within the known partition range
        if (partitionsCount != null && (partition == null || partition < partitionsCount))
            return new ClusterAndWaitTime(cluster, 0);
    
        long begin = time.milliseconds();
        long remainingWaitMs = maxWaitMs;
        long elapsed;
        
        //一直获取topic的元数据信息,直到获取成功,若获取时间超过maxWaitMs,则抛出异常
        do {
            if (partition != null) {
                log.trace("Requesting metadata update for partition {} of topic {}.", partition, topic);
            } else {
                log.trace("Requesting metadata update for topic {}.", topic);
            }
            metadata.add(topic);
            int version = metadata.requestUpdate();
            sender.wakeup();
            try {
                metadata.awaitUpdate(version, remainingWaitMs);
            } catch (TimeoutException ex) {
                // Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
                throw new TimeoutException(
                        String.format("Topic %s not present in metadata after %d ms.",
                                topic, maxWaitMs));
            }
            cluster = metadata.fetch();
            elapsed = time.milliseconds() - begin;
            if (elapsed >= maxWaitMs) {  //判断执行时间是否大于maxWaitMs
                throw new TimeoutException(partitionsCount == null ?
                        String.format("Topic %s not present in metadata after %d ms.",
                                topic, maxWaitMs) :
                        String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.",
                                partition, topic, partitionsCount, maxWaitMs));
            }
            metadata.maybeThrowException();
            remainingWaitMs = maxWaitMs - elapsed;
            partitionsCount = cluster.partitionCountForTopic(topic);
        } while (partitionsCount == null || (partition != null && partition >= partitionsCount));
    
        return new ClusterAndWaitTime(cluster, elapsed);
    }

    从它的实现我们可以看出,会导致线程阻塞的原因在于以下这个逻辑:

    private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long maxWaitMs) throws InterruptedException

    通过KafkaProducer 执行send的过程中需要先获取Metadata,而这是一个不断循环的操作,直到获取成功,或者抛出异常。

    其实Kafka本意这么实现并没有问题,因为你要发送消息的前提就是能获取到border和topic的信息,问题在于这个send对外暴露的是Future的方法,但是内部实现却是有阻塞的,那么在有些时候没有考虑到这种情况,一旦出现border或者topic异常,将会阻塞系统线程,导致系统响应变慢,直到奔溃。

    问题解决

    其实解决这个问题很简单,就是单独创建几个线程用于消息发送,这样即使遇到意外情况,也只会阻塞几个线程,不会引起系统线程大面积阻塞,不可用,具体实现:

    import java.util.concurrent.Callable
    import java.util.concurrent.ExecutorService
    import java.util.concurrent.Executors
    import org.apache.kafka.clients.producer.{Callback, KafkaProducer, ProducerRecord, RecordMetadata}
    
    class ProducerF[K,V](kafkaProducer: KafkaProducer[K,V]) {
    
      val executor: ExecutorService = Executors.newScheduledThreadPool(1)
    
      def sendAsync(producerRecord: ProducerRecord[K,V], callback: Callback) = {
        executor.submit(new Callable[RecordMetadata]() {
          def call = kafkaProducer.send(producerRecord, callback).get()
        })
      }
    }
    

      

  • 相关阅读:
    set转成toarray()
    list和set的拉拉扯扯的关系
    【转载】VNC和远程桌面的区别
    笔记本最小安装centos7 连接WiFi的方法
    mysql 索引优化 性能调优 锁
    PageHelper 自动去掉排序参数问题
    抽奖算法 百万次抽奖 单线程环境下 约 3.5 秒
    gitlab 安装和使用
    sharding sphere 分表分库 读写分离
    mycat 安装 分表 分库 读写分离
  • 原文地址:https://www.cnblogs.com/felixzh/p/11849296.html
Copyright © 2020-2023  润新知