• KafkaProducer的整体逻辑


    概述

    KafkaProducer是用户向kafka servers发送消息的客户端。官网上对producer的记载如下:

    1. Kafka所有的节点都可以应答metadata的请求,这些metadata中包含了分区所对应的leader信息,而这些leader允许生产者直接将数据发送到分区leader所在的broker。这样子客户端就可以直接将数据发送给这些leader对应的broker中,而不用经过路由。

    2. 客户端可以通过继承接口来控制将消息发送到哪一个分区。用户可以随机发送,也可以通过特定的方式指定发送到某个特定的分区。

    3. 批处理是提升效率的一种方式,kafkaProducer可以在内存中积累数据,然后在通过一个请求将这些数据发送出去。并且数据量的大小和积累时间的长短都是可以控制的。

    举例

    KafkaProducer包含在org.apache.kafka.clients这个包内。参照官方文档使用的时候也比较容易,下面是一个简单的例子。

    package com.zjl.play;
    
    import org.apache.kafka.clients.producer.*;
    import org.apache.log4j.BasicConfigurator;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.util.ArrayList;
    import java.util.LinkedList;
    import java.util.List;
    import java.util.Properties;
    import java.util.concurrent.Future;
    
    import static com.zjl.play.ProducerConstant.*;
    
    
    public class Producer {
    
        private static final Logger logger = LoggerFactory.getLogger(com.zjl.play.Producer.class);
        private KafkaProducer<String,String> producer;
        private Properties kafkaProperties = new Properties();
        private List<Future<RecordMetadata>> kafkaFutures;
        private String topic;
    
        public void process(List<String> events) {
            if (events == null) {
                logger.error("process list is null");
                return;
            }
            int processEvents = events.size();
    
            if (processEvents == 0) {
                logger.info("the number of process event is zero");
            }
    
            try {
                ProducerRecord<String, String> record;
                kafkaFutures.clear();
                for (String event : events) {
                    long startTime = System.currentTimeMillis();
                    Integer partitionId = null;
                    String eventKey = null;
                    record = new ProducerRecord(topic, partitionId, eventKey, event);
                    kafkaFutures.add(producer.send(record, new ProducerCallback(startTime)));
                }
            } catch (Exception e) {
                logger.error("get exception: " + e.toString());
            }
    
            try {
                if (processEvents > 0) {
                    for (Future<RecordMetadata> future : kafkaFutures) {
                        future.get();
                    }
                }
            } catch (Exception e) {
                logger.error(e.toString());
            }
    
        }
    
        public void start() {
            kafkaFutures = new LinkedList<Future<RecordMetadata>>();
            producer = new KafkaProducer<String, String>(kafkaProperties);
        }
    
        public void stop() {
            producer.close();
        }
    
        public void loadKafkaProperties(String bootStrapServers) {
            kafkaProperties.put(ProducerConfig.ACKS_CONFIG, DEFAULT_ACKS);
            //Defaults overridden based on config
            kafkaProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, DEFAULT_KEY_SERIALIZER);
            kafkaProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, DEFAULT_VALUE_SERIAIZER);
    //        kafkaProperties.putAll(context.getSubProperties(KAFKA_PRODUCER_PREFIX));
            kafkaProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
            topic = DEFAULT_TOPIC;
        }
    
        public static void main(String args[]) {
            BasicConfigurator.configure();
            System.setProperty("log4j.configuration", "conf/log4j.properties");
            Producer producer = new Producer();
            producer.loadKafkaProperties("sha2hb06:9092");
            producer.start();
            List<String> testList = new ArrayList<String>();
            testList.add("123");
            testList.add("456");
            testList.add("789");
            producer.process(testList);
            producer.stop();
        }
    }
    
    class ProducerCallback implements Callback {
        private static final Logger logger = LoggerFactory.getLogger(ProducerCallback.class);
        private long startTime;
    
        public ProducerCallback(long startTime) {
            this.startTime = startTime;
        }
    
        public void onCompletion(RecordMetadata metadata, Exception exception) {
            if (exception != null) {
                logger.debug("Error sending message to Kafka {} ", exception.getMessage());
            }
    
            if (logger.isDebugEnabled()) {
                long eventElapsedTime = System.currentTimeMillis() - startTime;
                logger.debug("Acked message partition:{} ofset:{}",  metadata.partition(), metadata.offset());
                logger.debug("Elapsed time for send: {}", eventElapsedTime);
            }
        }
    }
    
    

    从上面的代码看,整个过程主要包含两部分。

    1. 创建KafkaProducer实例。
    2. 调用send()函数异步发送。

    源码学习

    总体

    看了KafkaProducer的源码,主要包含两个部分,获取集群的metadata,将消息发送到对应的broker中。整个消息的网络传输是通过NIO来实现的。

    图1

    图1中,Metadata里面保存了集群的topic信息,RecordAccmulator 类似一个队列,里面保存了要发送的内容,Sender会从RecordAccmulator队列中取出消息,并交给NetworkClient进行发送。

    图2

    图2表示了producer的代码层次,从下往上层层封装。

    首先简要看下每层里面的代码结构,然后不断深入。

    • org.apache.kafka.clients.producer
      整个producer包里面是放着producer的客户端实现,以及和客户端相关的接口,用户实现接口来完成不同的功能。

      • KafkaProducer: producer客户端。
      • Partitioner: 分区接口,可以实现它来制定不同的分区策略。
      • ProducerInterceptor: 过滤接口,可以实现它来对数据进行过滤。
      • ProducerRecord: 封装发送到kafka的数据,里面除了消息,还有其他的一些相关属性值,例如topic,partition。
      • RecordMetadata: 封装了kafka server 返回的数据信息。
      • Callback: 回调接口
    • org.apache.kafka.clients.producer.internals

      • BufferPool: 一个ByteBuffers资源池,用来分配内存
      • DefaultPartitioner: 实现一个默认的分区方式,如果指定了partition,就使用它,然后如果指定了key,就使用hash,然后如果都没有,就轮训使用
      • ErrorLoggingCallback: 一个Callback实现方式
      • FutureRecordMetadata: The future result of a record send
      • ProducerRequestResult: 一个类封装了将一条信息发送到对应的一个partition后的返回结果。这里面有一个done 函数,调用它之后会提示对应的线程这条record已经处理完毕。
      • producerInterceptors:这个类是是一个容器,里面包含了一个的list 对象,list中是用户自定义的 ProducerInterceptor。 每条 record 在 序列化之前都会被list 中的每个 ProducerInterceptor 进行预处理。
      • RecordAccmulator:这个类维护了一个队列,保存了将要发送的records
      • RecordBatch: 一个类保存了一批将要发送的record
      • Sender: 这个类不断的从accumulator 里面获取records,并发送

    Kafka producer

    KafkaProducer 这个类相当于一个builder,它初始化了interceptors, accumulator, metadata, NetworkClient, Sender等多个对象。并启动了一个守护线程来不断地跑Sender.run函数。

        private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
                ....
                
                this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
                
                ....
                this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG), true, clusterResourceListeners);
    
                ....
                this.accumulator = new RecordAccumulator(config.getInt(ProducerConfig.BATCH_SIZE,
                ....
                        time);
                ....
                NetworkClient client = new NetworkClient(
                        ....
                        this.requestTimeoutMs, time);
                        
                this.sender = new Sender(client,
                        ....
                        this.requestTimeoutMs);
                String ioThreadName = "kafka-producer-network-thread" + (clientId.length() > 0 ? " | " + clientId : "");
                this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
                this.ioThread.start();
                ....
            } catch (Throwable t) {
                ....
            }
        }
    
    

    KafkaProducer 的 send 函数实际上是将record 添加到 accumulator 队列中。

        @Override
        public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
            // intercept the record, which can be potentially modified; this method does not throw exceptions
            ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
            return doSend(interceptedRecord, callback);
        }
    
        /**
         * Implementation of asynchronously send a record to a topic.
         */
        private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
            TopicPartition tp = null;
            try {
                // first make sure the metadata for the topic is available
                
                ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
                ....添加到accumulator中
                RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
                ....返回future 对象,里面保存了record发送的结果。
                return result.future;
                // handling exceptions and record the errors;
                // for API exceptions return them in the future,
                // for other exceptions throw directly
            } catch (ApiException e) {
            ....处理各种异常
            }
        }
    

    而实际的发送是通过Sender的run 函数实现的。

        void run(long now) {
            获取到当前的集群信息
            Cluster cluster = metadata.fetch();
            // get the list of partitions with data ready to send
            获取当前准备发送的partitions,获取的条件如下:
            1.record set 满了
            2.record 等待的时间达到了 lingerms
            3.accumulator 的内存满了
            4.accumulator 要关闭了
            RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
            如果有些partition没有leader信息,更新metadata
            // if there are any partitions whose leaders are not known yet, force metadata update
            if (!result.unknownLeaderTopics.isEmpty()) {
                // The set of topics with unknown leader contains topics with leader election pending as well as
                // topics which may have expired. Add the topic again to metadata to ensure it is included
                // and request metadata update, since there are messages to send to the topic.
                for (String topic : result.unknownLeaderTopics)
                    this.metadata.add(topic);
                this.metadata.requestUpdate();
            }
            去掉那些不能发送信息的节点,能够发送的原因有:
            1.当前节点的信息是可以信赖的
            2.能够往这些节点发送信息
            // remove any nodes we aren't ready to send to
            Iterator<Node> iter = result.readyNodes.iterator();
            long notReadyTimeout = Long.MAX_VALUE;
            while (iter.hasNext()) {
                Node node = iter.next();
                if (!this.client.ready(node, now)) {
                    iter.remove();
                    notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));
                }
            }
    
            获取要发送的records
            // create produce requests
            Map<Integer, List<RecordBatch>> batches = this.accumulator.drain(cluster,
                                                                             result.readyNodes,
                                                                             this.maxRequestSize,
                                                                             now);
            保证发送的顺序                                               
            if (guaranteeMessageOrder) {
                // Mute all the partitions drained
                for (List<RecordBatch> batchList : batches.values()) {
                    for (RecordBatch batch : batchList)
                        this.accumulator.mutePartition(batch.topicPartition);
                }
            }
            
            检查那些过期的records
            List<RecordBatch> expiredBatches = this.accumulator.abortExpiredBatches(this.requestTimeout, now);
            // update sensors
            for (RecordBatch expiredBatch : expiredBatches)
                this.sensors.recordErrors(expiredBatch.topicPartition.topic(), expiredBatch.recordCount);
    
            sensors.updateProduceRequestMetrics(batches);
            
            构建request并发送
            List<ClientRequest> requests = createProduceRequests(batches, now);
            // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
            // loop and try sending more data. Otherwise, the timeout is determined by nodes that have partitions with data
            // that isn't yet sendable (e.g. lingering, backing off). Note that this specifically does not include nodes
            // with sendable data that aren't ready to send since they would cause busy looping.
            long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
            if (result.readyNodes.size() > 0) {
                log.trace("Nodes with data ready to send: {}", result.readyNodes);
                log.trace("Created {} produce requests: {}", requests.size(), requests);
                pollTimeout = 0;
            }
            将这些requests加入channel中
            for (ClientRequest request : requests)
                client.send(request, now);
    
            // if some partitions are already ready to be sent, the select time would be 0;
            // otherwise if some partition already has some data accumulated but not ready yet,
            // the select time will be the time difference between now and its linger expiry time;
            // otherwise the select time will be the time difference between now and the metadata expiry time;
            真正的发送消息
            this.client.poll(pollTimeout, now);
        }
    

    总结

    上面的内容描述了producer这个层面消息发送的整体情况。通过上面的内容,我们知道producer是将消息放到了一个队列中,并通过一个线程不断的从这个队列中取内容,然后发送到服务器。在这个层面中,我们没有看到nio的一点影子,所有的发送请求都是通过调用org.apache.kafka.clients.client 这个包里面的函数进行的,实现了很好的封装。

  • 相关阅读:
    如何在DBGrid中能支持多项记录的选择
    How to create a OnCellDblClick for Delphi's TDBGrid
    如何在DBGrid里实现Shift+“选择行”区间多选的功能!
    DBGrid中Shift多选
    代码校验工具 SublimeLinter 的安装与使用
    jquery压缩图片插件
    React 入门最好的实例-TodoList
    前端切图
    提升前端效率的方式
    单页面应用的痛点
  • 原文地址:https://www.cnblogs.com/SpeakSoftlyLove/p/6756505.html
Copyright © 2020-2023  润新知