• kafka 生产者发送消息


    KafkaProducer 创建一个 KafkaThread 来运行 Sender.run 方法。

    1. 发送消息的入口在 KafkaProducer#doSend 中,但其实是把消息加入到 batches 中:

    kafka 生产者是按 batch 发送消息,RecordAccumulator 类有个变量 ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches,
    KafkaProducer#doSend 方法会把当前的这条消息放入到 ProducerBatch 中。然后调用 Sender#wakeup 方法,尝试唤醒阻塞的 io 线程。


    2. 从 batches 取出数据发送,入口在 Sender.run,主要的逻辑抽象为 3 步:

    2.1 RecordAccumulator#drain 取出数据

    // 每个分区只取一个 ProducerBatch
    public Map<Integer, List<ProducerBatch>> drain(Cluster cluster,
                                                   Set<Node> nodes,
                                                   int maxSize,
                                                   long now) {
        if (nodes.isEmpty())
            return Collections.emptyMap();
    
        Map<Integer, List<ProducerBatch>> batches = new HashMap<>();
        for (Node node : nodes) {
            int size = 0;
            // 取出该节点负责的分区
            List<PartitionInfo> parts = cluster.partitionsForNode(node.id());
            List<ProducerBatch> ready = new ArrayList<>();
            /* to make starvation less likely this loop doesn't start at 0 */
            int start = drainIndex = drainIndex % parts.size();
            // 遍历每个分区
            do {
                PartitionInfo part = parts.get(drainIndex);
                TopicPartition tp = new TopicPartition(part.topic(), part.partition());
                // Only proceed if the partition has no in-flight batches.
                if (!muted.contains(tp)) {
                    Deque<ProducerBatch> deque = getDeque(tp);
                    if (deque != null) {
                        synchronized (deque) {
                            ProducerBatch first = deque.peekFirst();
                            if (first != null) {
                                boolean backoff = first.attempts() > 0 && first.waitedTimeMs(now) < retryBackoffMs;
                                // Only drain the batch if it is not during backoff period.
                                if (!backoff) {
                                    if (size + first.estimatedSizeInBytes() > maxSize && !ready.isEmpty()) {
                                        // there is a rare case that a single batch size is larger than the request size due
                                        // to compression; in this case we will still eventually send this batch in a single
                                        // request
                                        break;
                                    } else {
                                        ProducerIdAndEpoch producerIdAndEpoch = null;
                                        boolean isTransactional = false;
                                        if (transactionManager != null) {
                                            if (!transactionManager.isSendToPartitionAllowed(tp))
                                                break;
    
                                            producerIdAndEpoch = transactionManager.producerIdAndEpoch();
                                            if (!producerIdAndEpoch.isValid())
                                                // we cannot send the batch until we have refreshed the producer id
                                                break;
    
                                            isTransactional = transactionManager.isTransactional();
    
                                            if (!first.hasSequence() && transactionManager.hasUnresolvedSequence(first.topicPartition))
                                                // Don't drain any new batches while the state of previous sequence numbers
                                                // is unknown. The previous batches would be unknown if they were aborted
                                                // on the client after being sent to the broker at least once.
                                                break;
    
                                            int firstInFlightSequence = transactionManager.firstInFlightSequence(first.topicPartition);
                                            if (firstInFlightSequence != RecordBatch.NO_SEQUENCE && first.hasSequence()
                                                    && first.baseSequence() != firstInFlightSequence)
                                                // If the queued batch already has an assigned sequence, then it is being
                                                // retried. In this case, we wait until the next immediate batch is ready
                                                // and drain that. We only move on when the next in line batch is complete (either successfully
                                                // or due to a fatal broker error). This effectively reduces our
                                                // in flight request count to 1.
                                                break;
                                        }
    
                                        ProducerBatch batch = deque.pollFirst();
                                        if (producerIdAndEpoch != null && !batch.hasSequence()) {
                                            // If the batch already has an assigned sequence, then we should not change the producer id and
                                            // sequence number, since this may introduce duplicates. In particular,
                                            // the previous attempt may actually have been accepted, and if we change
                                            // the producer id and sequence here, this attempt will also be accepted,
                                            // causing a duplicate.
                                            //
                                            // Additionally, we update the next sequence number bound for the partition,
                                            // and also have the transaction manager track the batch so as to ensure
                                            // that sequence ordering is maintained even if we receive out of order
                                            // responses.
                                            batch.setProducerState(producerIdAndEpoch, transactionManager.sequenceNumber(batch.topicPartition), isTransactional);
                                            transactionManager.incrementSequenceNumber(batch.topicPartition, batch.recordCount);
                                            log.debug("Assigned producerId {} and producerEpoch {} to batch with base sequence " +
                                                            "{} being sent to partition {}", producerIdAndEpoch.producerId,
                                                    producerIdAndEpoch.epoch, batch.baseSequence(), tp);
    
                                            transactionManager.addInFlightBatch(batch);
                                        }
                                        batch.close();
                                        size += batch.records().sizeInBytes();
                                        ready.add(batch);
                                        batch.drained(now);
                                    }
                                }
                            }
                        }
                    }
                }
                this.drainIndex = (this.drainIndex + 1) % parts.size();
            } while (start != drainIndex);
            batches.put(node.id(), ready);
        }
        return batches;
    }

    2.2 NetworkClient.send

    这里的 send 不是真正的网络发送,先把 ProduceReuquest 序列化成 Send 对象,然后加入到 inFlightRequests 的头部,调用 selector 的 send,实则是 KafkaChannel.setSend()

    Send send = request.toSend(nodeId, header);
    
    this.inFlightRequests.add(inFlightRequest);
    
    selector.send(inFlightRequest.send);

     一个 NetworkSend 对象对应一个 ProduceRequest,包含一个或多个 ProducerBatch,也就是说一次网络会发送多个 batch,这也是 kafka 吞吐量大的原因之一。

    2.3 NetworkClient.poll
    真正的网络发送

    Selector#pollSelectionKeys 处理网络读写事件,发送消息即写事件,同时把响应存放在 Selector#completedReceives 中
    producer 发送消息,如果 acks = -1 和 1,即 producer 请求需要响应,
    在 NetworkClient#handleCompletedSends 中,把不需要响应的请求,从 inFlightRequests 中删除
    在 NetworkClient#handleCompletedReceives 处理响应
    producer 设置了 ack 的值是固定的,producer 要么都需要响应,要么都不需要响应。
    新的请求加在头部,收到的响应对应最旧的请求,即尾部的请求。

    3. 主要的类
    KafkaProducer: 直接暴露给用户的 api 类;Sender: 主要管理 ProducerBatch
    NetworkClient: ProducerBatch 是对象,通过网络发送需要序列化,该类管理连接,更接近 io 层
    Selector 对 java nio Selector 的封装
    KafkaChannel

    4. ByteBuffer

    // ByteBuffer 的使用
    // ByteBuffer 初始是写模式
    public static void main(String[] args) throws UnsupportedEncodingException {
        // capacity = 512, limit = 512, position = 0
        ByteBuffer buffer = ByteBuffer.allocate(512);
        buffer.put((byte)'h');
        buffer.put((byte)'e');
        buffer.put((byte)'l');
        buffer.put((byte)'l');
        buffer.put((byte)'o');
    
        // limit = position, position = 0
        buffer.flip();
    
        // 获取字节数
        int len = buffer.remaining();
        byte[] dst = new byte[len];
        buffer.get(dst);
        System.out.println(new String(dst));
        // 结论:ByteBuffer 只是对 byte[] 的封装
    }
    
    //SocketChannel
    //输出
    //SocketChannel#write(java.nio.ByteBuffer)
    //读取输入
    //SocketChannel#read(java.nio.ByteBuffer)
  • 相关阅读:
    C# 获取本地电脑所有的盘符
    C#winform程序安装在默认路径提示权限不足的问题
    BadImageFormatException异常处理
    未能找到元数据文件解决办法
    [转]DllImport属性详解
    九度oj1163题
    一致性哈希
    salt-stack部署
    查询系统支持的最大线程和进程数量
    Linux 单机测试 RS232 串口
  • 原文地址:https://www.cnblogs.com/allenwas3/p/11615210.html
Copyright © 2020-2023  润新知