KafkaProducer 创建一个 KafkaThread 来运行 Sender.run 方法。
1. 发送消息的入口在 KafkaProducer#doSend 中,但其实是把消息加入到 batches 中:
kafka 生产者是按 batch 发送消息,RecordAccumulator 类有个变量 ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches,
KafkaProducer#doSend 方法会把当前的这条消息放入到 ProducerBatch 中。然后调用 Sender#wakeup 方法,尝试唤醒阻塞的 io 线程。
2. 从 batches 取出数据发送,入口在 Sender.run,主要的逻辑抽象为 3 步:
2.1 RecordAccumulator#drain 取出数据
// 每个分区只取一个 ProducerBatch public Map<Integer, List<ProducerBatch>> drain(Cluster cluster, Set<Node> nodes, int maxSize, long now) { if (nodes.isEmpty()) return Collections.emptyMap(); Map<Integer, List<ProducerBatch>> batches = new HashMap<>(); for (Node node : nodes) { int size = 0; // 取出该节点负责的分区 List<PartitionInfo> parts = cluster.partitionsForNode(node.id()); List<ProducerBatch> ready = new ArrayList<>(); /* to make starvation less likely this loop doesn't start at 0 */ int start = drainIndex = drainIndex % parts.size(); // 遍历每个分区 do { PartitionInfo part = parts.get(drainIndex); TopicPartition tp = new TopicPartition(part.topic(), part.partition()); // Only proceed if the partition has no in-flight batches. if (!muted.contains(tp)) { Deque<ProducerBatch> deque = getDeque(tp); if (deque != null) { synchronized (deque) { ProducerBatch first = deque.peekFirst(); if (first != null) { boolean backoff = first.attempts() > 0 && first.waitedTimeMs(now) < retryBackoffMs; // Only drain the batch if it is not during backoff period. if (!backoff) { if (size + first.estimatedSizeInBytes() > maxSize && !ready.isEmpty()) { // there is a rare case that a single batch size is larger than the request size due // to compression; in this case we will still eventually send this batch in a single // request break; } else { ProducerIdAndEpoch producerIdAndEpoch = null; boolean isTransactional = false; if (transactionManager != null) { if (!transactionManager.isSendToPartitionAllowed(tp)) break; producerIdAndEpoch = transactionManager.producerIdAndEpoch(); if (!producerIdAndEpoch.isValid()) // we cannot send the batch until we have refreshed the producer id break; isTransactional = transactionManager.isTransactional(); if (!first.hasSequence() && transactionManager.hasUnresolvedSequence(first.topicPartition)) // Don't drain any new batches while the state of previous sequence numbers // is unknown. The previous batches would be unknown if they were aborted // on the client after being sent to the broker at least once. break; int firstInFlightSequence = transactionManager.firstInFlightSequence(first.topicPartition); if (firstInFlightSequence != RecordBatch.NO_SEQUENCE && first.hasSequence() && first.baseSequence() != firstInFlightSequence) // If the queued batch already has an assigned sequence, then it is being // retried. In this case, we wait until the next immediate batch is ready // and drain that. We only move on when the next in line batch is complete (either successfully // or due to a fatal broker error). This effectively reduces our // in flight request count to 1. break; } ProducerBatch batch = deque.pollFirst(); if (producerIdAndEpoch != null && !batch.hasSequence()) { // If the batch already has an assigned sequence, then we should not change the producer id and // sequence number, since this may introduce duplicates. In particular, // the previous attempt may actually have been accepted, and if we change // the producer id and sequence here, this attempt will also be accepted, // causing a duplicate. // // Additionally, we update the next sequence number bound for the partition, // and also have the transaction manager track the batch so as to ensure // that sequence ordering is maintained even if we receive out of order // responses. batch.setProducerState(producerIdAndEpoch, transactionManager.sequenceNumber(batch.topicPartition), isTransactional); transactionManager.incrementSequenceNumber(batch.topicPartition, batch.recordCount); log.debug("Assigned producerId {} and producerEpoch {} to batch with base sequence " + "{} being sent to partition {}", producerIdAndEpoch.producerId, producerIdAndEpoch.epoch, batch.baseSequence(), tp); transactionManager.addInFlightBatch(batch); } batch.close(); size += batch.records().sizeInBytes(); ready.add(batch); batch.drained(now); } } } } } } this.drainIndex = (this.drainIndex + 1) % parts.size(); } while (start != drainIndex); batches.put(node.id(), ready); } return batches; }
2.2 NetworkClient.send
这里的 send 不是真正的网络发送,先把 ProduceReuquest 序列化成 Send 对象,然后加入到 inFlightRequests 的头部,调用 selector 的 send,实则是 KafkaChannel.setSend()
Send send = request.toSend(nodeId, header); this.inFlightRequests.add(inFlightRequest); selector.send(inFlightRequest.send);
一个 NetworkSend 对象对应一个 ProduceRequest,包含一个或多个 ProducerBatch,也就是说一次网络会发送多个 batch,这也是 kafka 吞吐量大的原因之一。
2.3 NetworkClient.poll
真正的网络发送
Selector#pollSelectionKeys 处理网络读写事件,发送消息即写事件,同时把响应存放在 Selector#completedReceives 中
producer 发送消息,如果 acks = -1 和 1,即 producer 请求需要响应,
在 NetworkClient#handleCompletedSends 中,把不需要响应的请求,从 inFlightRequests 中删除
在 NetworkClient#handleCompletedReceives 处理响应
producer 设置了 ack 的值是固定的,producer 要么都需要响应,要么都不需要响应。
新的请求加在头部,收到的响应对应最旧的请求,即尾部的请求。
3. 主要的类
KafkaProducer: 直接暴露给用户的 api 类;Sender: 主要管理 ProducerBatch
NetworkClient: ProducerBatch 是对象,通过网络发送需要序列化,该类管理连接,更接近 io 层
Selector 对 java nio Selector 的封装
KafkaChannel
4. ByteBuffer
// ByteBuffer 的使用 // ByteBuffer 初始是写模式 public static void main(String[] args) throws UnsupportedEncodingException { // capacity = 512, limit = 512, position = 0 ByteBuffer buffer = ByteBuffer.allocate(512); buffer.put((byte)'h'); buffer.put((byte)'e'); buffer.put((byte)'l'); buffer.put((byte)'l'); buffer.put((byte)'o'); // limit = position, position = 0 buffer.flip(); // 获取字节数 int len = buffer.remaining(); byte[] dst = new byte[len]; buffer.get(dst); System.out.println(new String(dst)); // 结论:ByteBuffer 只是对 byte[] 的封装 } //SocketChannel //输出 //SocketChannel#write(java.nio.ByteBuffer) //读取输入 //SocketChannel#read(java.nio.ByteBuffer)