kafka集群的性能受限于JVM参数、服务器的硬件配置以及kafka的配置,因此需要对所要部署kafka的机器进行性能测试,根据测试结果,找出符合业务需求的最佳配置。
1、kafka broker jVM参数
kafka broker jVM 是由脚本kafka-server-start.sh中参数KAFKA_HEAP_OPTS来控制的,如果不设置,默认是1G
可以在首行添加KAFKA_HEAP_OPTS配置,注意如果要使用G1垃圾回收器,堆内存最小4G,jdk至少jdk7u51以上
举例:
export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G -Xmn2G -XX:PermSize=64m -XX:MaxPermSize=128m -XX:SurvivorRatio=6 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
2、kafka集群性能测试工具 (基于kafka_2.11-0.11.0.0)
kafka自带的测试工具:针对生产者的kafka-producer-perf-test.sh和针对消费者的kafka-consumer-perf-test.sh
2.1 kafka-producer-perf-test.sh
参数说明:
--help 显示帮助
--topic topic名称
--record-size 每条消息的字节数
--throughput 消息吞吐量,每秒钟发送的消息数
--producer-props 生产者相关的配置属性, 如bootstrap.servers,client.id等,这些配置优先于--producer.config
--producer.config 生产者配置文件,producer.properties
--print-metrics 在测试结束时打印出度量值。(默认值: false)
--num-records 总共需要发送的消息数量
--payload-file 要发送消息所在的文件名,文件里是要发送消息数据,与num-records两者至少选一个
--payload-delimiter payload-file中消息分隔符
--transactional-id 用于测试并发事务的性能 (默认值:performance-producer-default-transactional-id)
--transaction-duration-ms 事务最大值,当超过这个时间 就会提交事务 (默认值: 0)
举例:
./bin/kafka-producer-perf-test.sh --topic test-pati3-rep2 --throughput 500000 --num-records 1500000 --record-size 1000 --producer.config config/producer.properties --producer-props bootstrap.servers=10.1.8.16:9092,10.1.8.15:9092,10.1.8.14:9092 acks=1
测试维度:可以调整JVM、分区数、副本数、throughput--吞吐量、record-size--消息大小、acks--副本响应模式、compression-codec--压缩方式
[cluster@PCS101 bin]$ ./kafka-producer-perf-test.sh --topic REC-CBBO-MSG-TOPIC --throughput 50000 --num-records 150000 --record-size 102400 --producer-props bootstrap.servers=134.32.123.101:9092,134.32.123.102:9092,134.32.123.103:9092 acks=all --print-metrics
12786 records sent, 2556.7 records/sec (249.68 MB/sec), 122.1 ms avg latency, 231.0 max latency.
14827 records sent, 2965.4 records/sec (289.59 MB/sec), 109.4 ms avg latency, 291.0 max latency.
14587 records sent, 2917.4 records/sec (284.90 MB/sec), 111.6 ms avg latency, 374.0 max latency.
14292 records sent, 2858.4 records/sec (279.14 MB/sec), 114.8 ms avg latency, 389.0 max latency.
14557 records sent, 2910.8 records/sec (284.26 MB/sec), 112.3 ms avg latency, 354.0 max latency.
14524 records sent, 2904.2 records/sec (283.62 MB/sec), 113.1 ms avg latency, 362.0 max latency.
14686 records sent, 2937.2 records/sec (286.84 MB/sec), 111.4 ms avg latency, 348.0 max latency.
14637 records sent, 2927.4 records/sec (285.88 MB/sec), 111.8 ms avg latency, 378.0 max latency.
15186 records sent, 3037.2 records/sec (296.60 MB/sec), 107.9 ms avg latency, 343.0 max latency.
14584 records sent, 2916.2 records/sec (284.79 MB/sec), 112.4 ms avg latency, 356.0 max latency.
150000 records sent, 2888.170055 records/sec (282.05 MB/sec), 112.78 ms avg latency, 389.00 ms max latency, 11 ms 50th, 321 ms 95th, 340 ms 99th, 375 ms 99.9th.
最后一条记录是个总体统计:发送的总记录数,平均的TPS(每秒处理的消息数),平均延迟,最大延迟, 然后我们将发送记录数最小的那一行作为生产者瓶颈(红色记录)
如果加上--print-metrics 最后会打印metrics统计信息:
Metric Name Value kafka-metrics-count:count:{client-id=producer-1} : 84.000 producer-metrics:batch-size-avg:{client-id=producer-1} : 102472.000 producer-metrics:batch-size-max:{client-id=producer-1} : 102472.000 producer-metrics:batch-split-rate:{client-id=producer-1} : 0.000 producer-metrics:buffer-available-bytes:{client-id=producer-1} : 33554432.000 producer-metrics:buffer-exhausted-rate:{client-id=producer-1} : 0.000 producer-metrics:buffer-total-bytes:{client-id=producer-1} : 33554432.000 producer-metrics:bufferpool-wait-ratio:{client-id=producer-1} : 0.857 producer-metrics:compression-rate-avg:{client-id=producer-1} : 1.000 producer-metrics:connection-close-rate:{client-id=producer-1} : 0.000 producer-metrics:connection-count:{client-id=producer-1} : 5.000 producer-metrics:connection-creation-rate:{client-id=producer-1} : 0.091 producer-metrics:incoming-byte-rate:{client-id=producer-1} : 87902.611 producer-metrics:io-ratio:{client-id=producer-1} : 0.138 producer-metrics:io-time-ns-avg:{client-id=producer-1} : 69622.263 producer-metrics:io-wait-ratio:{client-id=producer-1} : 0.329 producer-metrics:io-wait-time-ns-avg:{client-id=producer-1} : 166147.404 producer-metrics:metadata-age:{client-id=producer-1} : 55.104 producer-metrics:network-io-rate:{client-id=producer-1} : 1557.405 producer-metrics:outgoing-byte-rate:{client-id=producer-1} : 278762290.882 producer-metrics:produce-throttle-time-avg:{client-id=producer-1} : 0.000 producer-metrics:produce-throttle-time-max:{client-id=producer-1} : 0.000 producer-metrics:record-error-rate:{client-id=producer-1} : 0.000 producer-metrics:record-queue-time-avg:{client-id=producer-1} : 110.963 producer-metrics:record-queue-time-max:{client-id=producer-1} : 391.000 producer-metrics:record-retry-rate:{client-id=producer-1} : 0.000 producer-metrics:record-send-rate:{client-id=producer-1} : 2724.499 producer-metrics:record-size-avg:{client-id=producer-1} : 102487.000 producer-metrics:record-size-max:{client-id=producer-1} : 102487.000 producer-metrics:records-per-request-avg:{client-id=producer-1} : 3.493 producer-metrics:request-latency-avg:{client-id=producer-1} : 7.011 producer-metrics:request-latency-max:{client-id=producer-1} : 56.000 producer-metrics:request-rate:{client-id=producer-1} : 778.702 producer-metrics:request-size-avg:{client-id=producer-1} : 357989.537 producer-metrics:request-size-max:{client-id=producer-1} : 614940.000 producer-metrics:requests-in-flight:{client-id=producer-1} : 0.000 producer-metrics:response-rate:{client-id=producer-1} : 778.731 producer-metrics:select-rate:{client-id=producer-1} : 1979.326 producer-metrics:waiting-threads:{client-id=producer-1} : 0.000 producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--1} : 19.601 producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--2} : 3.956 producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-0} : 31220.396 producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-1} : 29885.883 producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-2} : 26920.163 producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node--1} : 1.324 producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node--2} : 0.436 producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-0} : 98518580.943 producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-1} : 82114190.903 producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-2} : 98518948.091 producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node--1} : 0.000 producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node--2} : 0.000 producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-0} : 6.891 producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-1} : 5.135 producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-2} : 11.202 producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node--1} : -Infinity producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node--2} : -Infinity producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-0} : 56.000 producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-1} : 46.000 producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-2} : 55.000 producer-node-metrics:request-rate:{client-id=producer-1, node-id=node--1} : 0.036 producer-node-metrics:request-rate:{client-id=producer-1, node-id=node--2} : 0.018 producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-0} : 279.365 producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-1} : 340.136 producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-2} : 160.233 producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node--1} : 36.500 producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node--2} : 24.000 producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-0} : 352658.869 producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-1} : 241415.634 producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-2} : 614858.709 producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node--1} : 49.000 producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node--2} : 24.000 producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-0} : 614940.000 producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-1} : 512460.000 producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-2} : 614940.000 producer-node-metrics:response-rate:{client-id=producer-1, node-id=node--1} : 0.036 producer-node-metrics:response-rate:{client-id=producer-1, node-id=node--2} : 0.018 producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-0} : 279.486 producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-1} : 340.284 producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-2} : 160.233 producer-topic-metrics:byte-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 279184829.991 producer-topic-metrics:compression-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 1.000 producer-topic-metrics:record-error-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 0.000 producer-topic-metrics:record-retry-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 0.000 producer-topic-metrics:record-send-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 2724.548
2.2 kafka-consumer-perf-test.sh
参数说明:
--help 显示帮助
--batch-size 在单个批处理中写入的消息数。(默认值: 200)
--broker-list 使用新的消费者是必需的,如果使用老的消费者就不是必需的
--supported codec 压缩方式 NoCompressionCodec 为 0(默认0不压缩), GZIPCompressionCodec 为 1, SnappyCompressionCodec 为 2, LZ4CompressionCodec 为3
--consumer.config 指定消费者配置文件 consumer.properties
--date-format 用于格式化时间字段的格式化字符串 (默认: yyyy-MM-dd HH:mm:ss:SSS)
--fetch-size 单个消费请求获取的数据字节量(默认: 1048576 (1M))
--from-latest 如果消费者还没有已建立的偏移量, 就从日志中的最新消息开始, 而不是最早的消息。
--group 消费者组id (默认值: perf-consumer-29512)
--hide-header 跳过打印数据头的统计信息
--message-size 每条消息大小(默认: 100字节)
--messages 必需,要获取的消息总数量
--new-consumer 使用新的消费者 这是默认值
--num-fetch-threads 获取消息的线程数 (默认: 1)
--print-metrics 打印出指标。这只适用于新的消费者。
--reporting-interval 打印报告信息的间隔 (以毫秒为单位,默认值: 5000)
--show-detailed-stats 根据报告间隔配置的每个报告间隔报告统计信息。
--socket-buffer-size TCP 获取信息的缓存大小(默认: 2097152 (2M))
--threads 处理线程数 (默认: 10)
--topic 必需 主题名称
--zookeeper zk清单,当使用老的消费者时必需
测试维度:调整以上参数值
[cluster@PCS101 bin]$ ./kafka-consumer-perf-test.sh --topic REC-CBBO-MSG-TOPIC --messages 500000 --message-size 102400 --batch-size 50000 --fetch-size 1048576 --num-fetch-threads 17 --threads 10 --zookeeper 134.32.123.101:2181,134.32.123.102:2181,134.32.123.103:2181 --print-metrics
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-10-01 09:50:43:707, 2018-10-01 09:51:40:553, 84487.7018, 1486.2559, 874167, 15377.8102
消费者瓶颈:1486.2559 MB.sec,15377nMsg.sec
3、可视化性能分析工具-Kafka Manager(Yammer Metrics)
--后续更新