• 【Maxwell】02 Kafka配置


    一、快速搭建Kafka环境

    基于Docker容器创建(供参考):

    https://www.cnblogs.com/mindzone/p/15608984.html

    这里简要写一下命令:

    # 拉取zk + kafka的镜像
    docker pull wurstmeister/zookeeper
    docker pull wurstmeister/kafka
    
    # 创建zk容器
    docker run -d --name zookeeper -p 2181:2181 -t wurstmeister/zookeeper
    
    # 创建kafka容器
    docker run -d --name kafka \
    -p 9092:9092 \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ZOOKEEPER_CONNECT=Linux主机IP:2181 \
    -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://Linux主机IP:9092 \
    -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 \
    -t wurstmeister/kafka
    
    # 检查kafka运行情况
    docker ps

    测试Topic消息是否正常生产和消费(注意终端是阻塞的,需要多开终端窗口测试):

    #窗口1 生产
    [root@centos-linux ~]# docker exec -it kafka /bin/bash
    bash-4.4# kafka-console-producer.sh --broker-list localhost:9092 --topic topic名称
    
    #窗口2 消费
    [root@centos-linux ~]# docker exec -it kafka /bin/bash
    bash-4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic名称 --from-beginning
    
    # 样例
    bash-4.4# kafka-console-producer.sh --broker-list localhost:9092 --topic producer
    bash
    -4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning

    二、配置Maxwell 绑定Kafka

    1、方式一,简单命令参数启动:

    cd /usr/local/maxwell-1.29.2
    ./bin/maxwell \
    --user='maxwell' \
    --password='123456' \
    --host='192.168.2.225' \
    --port='3308' \
    --producer=kafka \
    --kafka.bootstrap.servers=localhost:9092 \
    --kafka_topic=producer \
    --jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

    Maxwell运行成功的输出:

    [root@localhost maxwell-1.29.2]# ./bin/maxwell \
    > --user='maxwell' \
    > --password='123456' \
    > --host='192.168.2.225' \
    > --port='3308' \
    > --producer=kafka \
    > --kafka.bootstrap.servers=localhost:9092 \
    > --kafka_topic=producer \
    > --jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'
    Using kafka version: 1.0.0
    14:13:50,533 INFO  Maxwell - Starting Maxwell. maxMemory: 247332864 bufferMemoryUsage: 0.25
    14:13:50,783 INFO  ProducerConfig - ProducerConfig values: 
        acks = 1
        batch.size = 16384
        bootstrap.servers = [localhost:9092]
        buffer.memory = 33554432
        client.id = 
        compression.type = snappy
        connections.max.idle.ms = 540000
        enable.idempotence = false
        interceptor.classes = null
        key.serializer = class org.apache.kafka.common.serialization.StringSerializer
        linger.ms = 0
        max.block.ms = 60000
        max.in.flight.requests.per.connection = 5
        max.request.size = 1048576
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
        receive.buffer.bytes = 32768
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 30000
        retries = 0
        retry.backoff.ms = 100
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.mechanism = GSSAPI
        security.protocol = PLAINTEXT
        send.buffer.bytes = 131072
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        transaction.timeout.ms = 60000
        transactional.id = null
        value.serializer = class org.apache.kafka.common.serialization.StringSerializer
    14:13:50,847 INFO  AppInfoParser - Kafka version : 1.0.0
    14:13:50,847 INFO  AppInfoParser - Kafka commitId : aaa7af6d4a11b29d
    14:13:50,871 INFO  Maxwell - Maxwell v1.29.2 is booting (MaxwellKafkaProducer), starting at Position[BinlogPosition[mysql-bin.000005:225424], lastHeartbeat=1642486284932]
    14:13:51,040 INFO  MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql-bin.000005:16191], lastHeartbeat=0])
    14:13:51,205 INFO  BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000005:225424
    14:13:51,235 INFO  BinaryLogClient - Connected to 192.168.2.225:3308 at mysql-bin.000005/225424 (sid:6379, cid:215)
    14:13:51,235 INFO  BinlogConnectorReplicator - Binlog connected.

    2、方式二、写在配置文件中:

    cd /usr/local/maxwell-1.29.2
    vim config.properties

    参数项:

    kafka_topic=maxwell
    producer=kafka
    kafka.bootstrap.servers=localhost:9092
    
    host=192.168.2.225
    user=maxwell
    password=123456
    port=3308

    启动:

    cd /usr/local/maxwell-1.29.2
    
    ./bin/maxwell \
    --config ./config.properties \
    --jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

    三、Kafka监听测试

    由Kafka监听后,maxwell不再打印信息,后台运行,交由kafka发送
    在DB操作非查询SQL时,可以发现Kafka消费者能够收到消息

    消费者终端的消息:

    [root@localhost maxwell-1.29.2]# docker exec -it kafka /bin/bash
    bash-5.1# bash-4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning
    bash: bash-4.4#: command not found
    bash-5.1# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning
    [2022-01-18 06:09:16,853] WARN [Consumer clientId=consumer-console-consumer-5789-1, groupId=console-consumer-5789] Error while fetching metadata with correlation id 2 : {producer=LEADER_NOT_AVAILABLE} (org.apache.kafka.cli
    [2022-01-18 06:09:16,987] WARN [Consumer clientId=consumer-console-consumer-5789-1, groupId=console-consumer-5789] Error while fetching metadata with correlation id 4 : {producer=LEADER_NOT_AVAILABLE} (org.apache.kafka.cli
    hello
    aaaaaaaaaaaaaaa
    {"database":"test-db","table":"day_sale","type":"delete","ts":1642486851,"xid":71876,"commit":true,"data":{"ID":166,"PRODUCT":"产品C","CHANNEL":"淘宝","AMOUNT":2497.0000,"SALE_DATE":"2022-01-18 13:48:48"}}

    四、Kafka分区控制

    1、用途:

    希望kakfa能够并行执行,因为监听的消息都只送到一个分区的队列上,效率太慢
    让Kafka进行并发发送,就多开分区进行,每个分区同时执行消息发送

    2、问题:

    教程并没有说明是如何关联库和分区的关系,只是会有不同

    3、技术要点:

    如何配置maxwell对kafka的分区?

    参考config.properties对kafka配置的说明:

    #       *** kafka ***
    # list of kafka brokers
    #kafka.bootstrap.servers=hosta:9092,hostb:9092
    # kafka topic to write to
    # this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
    # in the latter case 'database' and 'table' will be replaced with the values for the row being processed
    #kafka_topic=maxwell
    # alternative kafka topic to write DDL (alter/create/drop) to.  Defaults to kafka_topic
    #ddl_kafka_topic=maxwell_ddl
    -- 这段是关于分区的配置信息:
    #           *** partitioning ***
    # 按照什么方式对数据进行划分? 
    # What part of the data do we partition by?
    # 参数项:库 表 主键 事务ID 线程ID 字段
    # producer_partition_by=database # [database, table, primary_key, transaction_id, thread_id, column]
    # 如果选用字段来对数据进行划分, 指定在使用producer\u partition\u by=column时,分区依据的字段
    # specify what fields to partition by when using producer_partition_by=column
    # column separated list.
    # 指明字段使用的是哪些
    # producer_partition_columns=id,foo,bar
    # 如果指明的字段不存在,则会分区规则回退到库名进行划分
    # when using producer_partition_by=column, partition by this when
    # the specified column(s) don't exist.
    # producer_partition_by_fallback=database
    #            *** kinesis ***
    # kinesis_stream=maxwell
    # AWS places a 256 unicode character limit on the max key length of a record
    # http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html
    #
    # Setting this option to true enables hashing the key with the md5 algorithm
    # before we send it to kinesis so all the keys work within the key size limit.
    # Values: true, false
    # Default: false
    #kinesis_md5_keys=true

    4、分区测试案例:

    - 1、创建新的Topic并分配6个分区

    # 进入kafka容器
    docker exec -it kafka /bin/bash
    # 创建主题并分配分区 (必须添加副本参数) kafka
    -topics.sh --zookeeper 192.168.177.129:2181 --topic maxwell --create --replication-factor 1 --partitions 6 副本数量 1 --replication-factor 1 分区数量 6 --partitions 6

    - 2、更新maxwell配置(按字段配置很少,就按照库划分配置即可)

    # kafka配置
    producer=kafka
    kafka.bootstrap.servers=localhost:9092
    
    # 改Topic名称
    kafka_topic=maxwell
            
    # 改分区配置
    producer_partition_by=database

    - 3、重新启动maxwell

    cd /usr/local/maxwell-1.29.2
    
    ./bin/maxwell \
    --config ./config.properties \
    --jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

    - 4、向库中写入数据,然后查看kafka消息(使用Kafka tool可视化工具)

    这一步省略具体步骤,只要是DML操作就行,效果查看使用【Kafka Tool】工具 (offset explorer)

    五、关于Kafka分区配置的命令补充

    Kafka基于这些命令脚本实现功能:

    [root@localhost maxwell-1.29.2]# docker exec -it kafka ls /opt/kafka_2.13-2.8.1/bin
    connect-distributed.sh               kafka-preferred-replica-election.sh
    connect-mirror-maker.sh              kafka-producer-perf-test.sh
    connect-standalone.sh                kafka-reassign-partitions.sh
    kafka-acls.sh                        kafka-replica-verification.sh
    kafka-broker-api-versions.sh         kafka-run-class.sh
    kafka-cluster.sh                     kafka-server-start.sh
    kafka-configs.sh                     kafka-server-stop.sh
    kafka-console-consumer.sh            kafka-storage.sh
    kafka-console-producer.sh            kafka-streams-application-reset.sh
    kafka-consumer-groups.sh             kafka-topics.sh
    kafka-consumer-perf-test.sh          kafka-verifiable-consumer.sh
    kafka-delegation-tokens.sh           kafka-verifiable-producer.sh
    kafka-delete-records.sh              trogdor.sh
    kafka-dump-log.sh                    windows
    kafka-features.sh                    zookeeper-security-migration.sh
    kafka-leader-election.sh             zookeeper-server-start.sh
    kafka-log-dirs.sh                    zookeeper-server-stop.sh
    kafka-metadata-shell.sh              zookeeper-shell.sh
    kafka-mirror-maker.sh

    语句执行报错

    kafka-topics.sh --zookeeper 192.168.177.129:2181 --topic maxwell --create --replication-factor 2 --partitions 3
    [2022-01-18 08:19:44,532] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: 
    Replication factor: 4 larger than available brokers: 1.

    报错思路分析

    https://www.cnblogs.com/tyoutetu/p/10855283.html

    # 即,需要Kafka集群, 一个Kafka代表一个broker,副本必须小于等于集群的数量
    --replication-factor (指定数量必须小于等于Kafka集群数,如果单个,写1即可)

    不能修改分区数量的原因:

    # 分区的数量只能增加,不能减少
    bash-5.1# kafka-topics.sh --zookeeper 192.168.177.129:2181 -alter --partitions 3 --topic maxwell
    WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
    Error while executing topic command : The number of partitions for a topic can only be increased. Topic maxwell currently has 6 partitions, 3 would not be 
    [2022-01-18 08:28:42,743] ERROR org.apache.kafka.common.errors.InvalidPartitionsException: The number of partitions for a topic can only be increased. Topi
    (kafka.admin.TopicCommand$)

    解决办法:

    删除主题 -> 重建主题

  • 相关阅读:
    oracle数据段详解
    监听静态注册与动态注册
    Oracle网络相关概念与常用配置文件
    pycharm社区版安装及遇到的问题
    强化学习-K摇臂赌博机
    概率图模型
    半监督学习
    卷积神经网络
    递归神经网络
    玻尔兹曼机及其相关模型
  • 原文地址:https://www.cnblogs.com/mindzone/p/15844608.html
Copyright © 2020-2023  润新知