一、pull & push
push的特点:
① 由broker主动推送,推送逻辑在broker端,消耗的是broker性能。
② 消息推送速率由broker决定,很难适应消费者速率不同的场景,可能导致消费者网络拥塞或空闲。
pull 的特点:
①由consumer客户端主动拉取消息,推送逻辑在consumer端,消耗的是consumer客户端的性能,会无限轮询等待。
②消息推送速率由consumer客户端自己决定,很好的适应消费者速率不同的场景。
kafkaConsumer采取pull的方式,由consumer客户端主动轮询拉取消息。释放broker压力,适应消费者速率不同的场景。
二、KafkaConsumer
主要介绍新版consumer内容
1、消费者组consumer group
消费者使用消费者组名来标记自己,topic的partition中的每条消息都只会被每个订阅它的消费者组中的一个消费者实例拉取。
① group可能会有多个consumer,也可能是只有一个consumer
② 对于同一group而言,消息仅能被group中某一个实例consumer拉取
③ topic中消息推送时是以group为对象的
使用consumer group的好处:实现高伸缩性、高容错性的consumer机制
① 高容错性:某个consumer实例挂了,consumer会使用rebalance将此consumer负责的分区转交给其他consumer来负责
② 高伸缩性:当消息频率过高时,可以动态增加consumer实例到对应的group中,提高并行性,消息频率过低时,可以动态减少consumer实例
consumer实例个数的变化都会引起group的rebalance机制,重新给各个consumer分配负责的分区
2、rebalance
rebalance触发的条件:
① group成员发生变动,特别是corrdinator认为consumer不可用时
② group订阅的topic发生变化
③ group订阅的topic的分区发生变化
corrdinator认为consumer不可用,并不一定是consumer宕机,当consumer无法在指定时间内完成消息处理,coordinator也会认为consumer不可用,这种情况下,就需要对consumer性能调优
rebalance分区分配策略:consumer默认采用range策略
①range策略:基于范围的思想,单个topic所有分区按顺序排列,然后划分为固定大小的分区段依次分配给consumer
②round-robin策略:基于轮询的思想,单个topic所有分区按顺序排列,然后轮询式地分配给consumer
③sticky策略:基于历史的“有黏性”的策略,有效避免上面两种无视历史分配方案的缺陷
rebalance每执行一次,consumer generation+1,会导致rebalance前,旧的consumer generation提交消息失败。
rebalance相关的五个协议:
① JoinGroup请求:consumer请求加入group。
② syncGroup请求:group leader把分配方案同步更新到group内所有成员中。
③ HeartBeat请求:consumer与coordinator心跳检测
④ LeaveGroup请求:consumer主动通知coordinator即将离开group
⑤ DescribeGroup请求:查看group的所有信息,主要是管理员使用,coordinator不使用该请求。
3、位移管理
consumer客户端需要为每个它要读取的分区保存消费进度,即分区中消费消息的位置consumer offset
consumer仅自己需要保存消费进度,还需要定期地向broker提交自己的位置信息offset,由于位移是从0开始,位移为offset的消息是第offset+1条消息
offset对于consumer非常重要,它是实现消息交付语义保证的基石,常见三种消息交付语义保证:
① 最多一次(at most once):消息可能丢失,但不会重复处理;消息消费前提交位移可实现
② 最少一次(at least once):消息不会丢失,但可能被处理多次;消息消费后提交位移可实现
③ 精确一次(exactly once):消息一定会被处理且只会被处理一次
具体过程:
① kafka集群初始化后会自行创建一个名“_consumer_offsets”的主题,默认50个分区,是用来记录consumer的消费进度offset的主题
② consumer会在kafka集群中选择一个broker作为consumer group的coordinator,用于实现组成员管理、消费分配方案制定及提交位移等。
③ 当消费者首次启动时,由于没有初始化位移信息,需要从coordinator(broker)获取初始位移值,coordinator根据“_consumer_offsets”中记录的消息,返回消费者offset。
④ consumer提交位移的对象是coordinator,位移记录在_consumer_offsets中
提供自动提交和手动提交位移的方式
自动提交:配置参数enable.auto.commit=true,auto.commit.interval.ms控制自动提交时间间隔
手动提交:配置参数enable.auto.commit=false,手动利用KafkaConsumer提供commitSync或conmmitAsync方法提交
自动提交优点在于开发成本低,简单易用,但无法实现精确控制,位移提交失败后不易处理,可能造成消息丢失,最多实现“最少一次”处理语义,使用场景:对消息交付语义无要求,容忍消息丢失
手动提交优点在于可精确控制位移提交,但需要额外开发成本,已实现“最少一次”处理语义,依赖外部状态可实现“精确一次”处理语义,使用场景:消息处理逻辑重,不允许消息丢失,至少要求“最少一次”
3、反序列化
对应producer的序列化,kafka默认提供多种反序列化器。
ByteArraySerializer:本质什么也没做,已经是字节数组了。
ByteBufferSerializer:反序列化ByteBuffer
BytesSerializer:反序列化kafka自定义的Bytes类
DoubleSerializer:反序列化Double类型
IntegerSerializer:反序列化Integer类型
LongSerializer:反序列化Long类型
StringSerializer:反序列化String类型
4.Java代码实现与参数说明
1 public class ConsumerTest { 2 3 public static void main(String[] args){ 4 Properties props = new Properties(); 5 //props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"mcip:9091,mcip:9092,mcip:9093"); 6 props.put("bootstrap.servers","mcip:9091,mcip:9092,mcip:9093"); 7 props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 8 props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 9 props.put("group.id","test");//groupId,必须配置 10 props.put("enable.auto.commit","true");//自动提交位移 11 props.put("auto.commit.interval.ms","1000");//自动提交位移时间间隔 12 props.put("auto.offset.reset","earliest");//从未读取过的最早的的消息开始读取, 13 props.put("session.timeout.ms","10000");//coordinator检测失败的最大时间,默认10s,即10s内consumer未响应coordinator,则认为不可用 14 props.put("max.poll.interval.ms","10000");//consumer处理消息最大时间 15 props.put("fetch.max.bytes","10485760");//单次获取消息的最大字节数 16 props.put("max.poll.records","500");//单次获取的最大消息数,默认500条 17 props.put("connections.max.idle.ms","540000");//Kafka定期关闭socket连接的时间,默认9分钟 18 try(Consumer<String,String> consumer = new KafkaConsumer<>(props);){ 19 //订阅具体topic 20 consumer.subscribe(Arrays.asList("topic-test")); 21 //也可采用正则订阅 22 consumer.subscribe(Pattern.compile("topic-*")); 23 while (true){ 24 ConsumerRecords<String,String> consumerRecords = consumer.poll(Duration.ofSeconds(1)); 25 if (consumerRecords.isEmpty()){ 26 //手动提交,上面enable.auto.commit需=false 27 //consumer.commitAsync(); 28 break; 29 } 30 for (ConsumerRecord<String, String> record : consumerRecords) { 31 System.out.println("value = "+record.value()+",partition"+record.partition()+",offset = "+record.offset()); 32 } 33 } 34 } 35 36 } 37 }
4、多线程处理
与KafkaProducer不同,KafkaComsumer是非线程安全的,所以处理方式不同
① 每个线程维护一个KafkaComsumer实例:实现简单速度快,方便位移管理,易于维护消息消费顺序;socket连接开销大,consumer数首先topic分区数,扩展性差;broker负载高,rebalance可能性大
public class ConsumerRunnable implements Runnable { private KafkaConsumer<String,String> consumer = null; public ConsumerRunnable(String brokerList, String groupId, String topic){ Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,brokerList); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.GROUP_ID_CONFIG,groupId); props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true"); props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"1000"); props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,"30000"); this.consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList(topic)); } @Override public void run() { while (true){ ConsumerRecords<String,String> records = consumer.poll(Duration.ofMillis(200)); if (records.isEmpty()){ try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); }finally { continue; } }else { for (ConsumerRecord<String, String> record : records) { System.out.println("thread = "+Thread.currentThread().getName()+",value = "+record.value()+",partition"+record.partition()+",offset = "+record.offset()); } } } } } public class ConsumerGroup { private List<ConsumerRunnable> consumers; public ConsumerGroup(int consumerNum, String groupId, String topic, String brokerList){ consumers = new ArrayList<>(consumerNum); for (int i = 0; i < consumerNum; i++) { consumers.add(new ConsumerRunnable(brokerList,groupId,topic)); } } public void execute(){ for (ConsumerRunnable consumer : consumers) { new Thread(consumer).start(); } } public static void main(String[] args){ ConsumerGroup group = new ConsumerGroup(3,"test","topic-test","mcip:9091"); group.execute(); } }
② 单KafkaConsumer实例+多worker线程:消息获取处理解耦,可独立扩展consumer数和worker数,伸缩性好,实现负载;难于维护分区内消息顺序;处理链路长,导致位移管理困难,worker线程异常导致会数据丢失
public class ConsumerWorker implements Runnable { private ConsumerRecords<String,String> records; private Map<TopicPartition,OffsetAndMetadata> offsets; public ConsumerWorker(ConsumerRecords<String,String> records,Map<TopicPartition,OffsetAndMetadata> offsets){ this.records = records; this.offsets = offsets; } @Override public void run() { for (TopicPartition partition : records.partitions()) { List<ConsumerRecord<String,String>> partitionRecords = records.records(partition); for (ConsumerRecord<String, String> record : partitionRecords) { System.out.println("thread = "+Thread.currentThread().getName()+",value = "+record.value()+",partition"+record.partition()+",offset = "+record.offset()); } long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset(); synchronized (offsets){ if (!offsets.containsKey(partition)){ offsets.put(partition, new OffsetAndMetadata(lastOffset +1)); }else{ long curr = offsets.get(partition).offset(); if (curr <= lastOffset + 1){ offsets.put(partition, new OffsetAndMetadata(lastOffset + 1)); } } } } } } public class ConsumerThreadHandler { private KafkaConsumer<String,String> consumer; private ExecutorService executors; private Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>(); public ConsumerThreadHandler(String brokerList, String groupId, String topic){ Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,brokerList); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.GROUP_ID_CONFIG,groupId); props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true"); props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"1000"); props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,"30000"); this.consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList(topic), new ConsumerRebalanceListener() { //rebalance前 @Override public void onPartitionsRevoked(Collection<TopicPartition> collection) { consumer.commitSync(); } //relalance后 @Override public void onPartitionsAssigned(Collection<TopicPartition> collection) { offsets.clear(); } }); } public void consume(int threadNumber){ executors = Executors.newFixedThreadPool(threadNumber); try { while (true){ ConsumerRecords<String,String> records = consumer.poll(Duration.ofSeconds(1)); if (records.isEmpty()){ try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); }finally { continue; } }else { executors.submit(new ConsumerWorker(records,offsets)); } } }finally { commitOffsets(); consumer.close(); } } private void commitOffsets(){ Map<TopicPartition,OffsetAndMetadata> unmodfiedMap; synchronized (offsets){ if (offsets.isEmpty()){ return; } unmodfiedMap = Collections.unmodifiableMap(new HashMap<>(offsets)); offsets.clear(); } } public void close() { consumer.close(); executors.shutdown(); } } public class Main { public static void main(String[] args){ ConsumerThreadHandler handler = new ConsumerThreadHandler("mcip:9091","test","topic-test"); int cpuCount = Runtime.getRuntime().availableProcessors(); new Thread(new Runnable() { @Override public void run() { handler.consume(cpuCount); } }).start(); try { Thread.sleep(30000); }catch (InterruptedException e){ e.printStackTrace(); } handler.close(); } }
5、独立consumer
①进程自己维护分区状态,严格固定consumer消费哪些分区;
②进程本身已经是高可用且能自动重启恢复错误,不需要kafka帮助完成错误检测和状态恢复。
此时,consumer group都是无用的,取而代之是独立consumer
独立consumer无groupId,采用assign方法直接给consumer分配分区
public class AssignTest { public static void main(String[] args){ Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"mcip:9092"); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true"); props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"1000"); props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest"); props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,"15000"); try(Consumer<String,String> consumer = new KafkaConsumer<>(props);){ List<PartitionInfo> partitionInfos = consumer.partitionsFor("topic-test"); List<TopicPartition> partitions = new ArrayList<>(); if (partitions != null){ for (PartitionInfo partitionInfo : partitionInfos) { if (partitionInfo.partition() == 1){ //指定分区 partitions.add(new TopicPartition(partitionInfo.topic(),partitionInfo.partition())); } } } consumer.assign(partitions); while (true){ ConsumerRecords<String,String> consumerRecords = consumer.poll(Duration.ofSeconds(1)); for (ConsumerRecord<String, String> record : consumerRecords) { System.out.println("value = "+record.value()+",partition = "+record.partition()+",offset = "+record.offset()); } } } } }
6、旧版本consumer
旧版kafka.consumer.Consumer;新版arg.apache.kafka.clients.KafkaConsumer
旧版kafka-core.jar;新版kafka-clients.jar
7、Spring中的consumer
public class SpringConsumer { private static final String HOST = "mcip"; private static ConcurrentMessageListenerContainer<String,String> listenerContainer; static{ Properties pro = new Properties(); pro.put("bootstrap.servers",HOST + ":9091"+","+HOST + ":9092"+","+HOST + ":9093"); pro.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); pro.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); pro.put("group.id","test"); pro.put("enable.auto.commit","true"); pro.put("auto.commit.interval.ms","1000"); pro.put("auto.offset.reset","earliest"); ConsumerFactory consumerFactory = new DefaultKafkaConsumerFactory(pro); ContainerProperties containerProperties = new ContainerProperties("topic-test"); containerProperties.setMessageListener(new Listener<String,String>()); listenerContainer = new ConcurrentMessageListenerContainer<>(consumerFactory,containerProperties); } static class Listener<K,V> implements MessageListener<K,V> { @Override public void onMessage(ConsumerRecord<K, V> consumerRecord) { System.out.println(consumerRecord.key()); System.out.println(consumerRecord.value()); listenerContainer.stop(); } } public static void main(String[] args){ listenerContainer.start(); } }