• Kafka 简易教程


    1.初识概念

    Apache Kafka是一个分布式消息发布订阅系统。

    Topic
    Kafka将消息种子(Feed)分门别类, 每一类的消息称之为话题(Topic).

    Producer
    发布消息的对象称之为话题生产者(Kafka topic producer)

    Consumer
    订阅消息并处理发布的消息的种子的对象称之为话题消费者(consumers)
    Broker
    已发布的消息保存在一组服务器中,称之为Kafka集群。集群中的每一个服务器都是一个代理(Broker). 消费者可以订阅一个或多个话题,并从Broker拉数据,从而消费这些已发布的消息。

    分区

    一个topic可以有一个或多个分区,每一个分区都是一个顺序的、不可变的消息队列, 并且可以持续的添加。分区中的消息都被分配了一个序列号,称之为偏移量(offset),在每个分区中此偏移量都是唯一的。

    每个分区有一个leader,零或多个follower。Leader处理此分区的所有的读写请求而follower被动的复制数据。如果leader当机,其它的一个follower会被推举为新的leader。

    通过分区的概念,Kafka可以在多个consumer组并发的情况下提供较好的有序性和负载均衡。将每个分区分只分发给一个consumer组,这样一个分区就只被这个组的一个consumer消费,就可以顺序的消费这个分区的消息。因为有多个分区,依然可以在多个consumer组之间进行负载均衡。注意consumer组的数量不能多于分区的数量,也就是有多少分区就允许多少并发消费。

    Kafka 只能保证一个分区之内消息的有序性,在不同的分区之间是不可以的,这已经可以满足大部分应用的需求。如果需要 topic 中所有消息的有序性,那就只能让这个 topic 只有一个分区,当然也就只有一个 consumer 组消费它。

    2.安装使用

    1. 下载 Kafka

    • 下载 wget http://apache.01link.hk/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz 或者 
    • wget http://ftp.cuhk.edu.hk/pub/packages/apache.org/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz(看哪个源比较快)
    • 解压 tar -xzf kafka_2.11-0.10.0.0.tgz
    • 进入文件夹 cd kafka_2.11-0.10.0.0/

    2. 启动服务

    • 启动 ZooKeeper bin/zookeeper-server-start.sh config/zookeeper.properties &(利用 &放到后台方便继续操作)
    • 启动 Kafka bin/kafka-server-start.sh config/server.properties &

    3. 创建一个叫做 dawang 的 topic,它只有一个分区,一个副本

    • 创建 bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic dawang
    • 查看 bin/kafka-topics.sh --list --zookeeper localhost:2181
    • 还可以配置 broker 让它自动创建 topic

    4. 发送消息。Kafka 使用一个简单的命令行producer,从文件中或者从标准输入中读取消息并发送到服务端。默认的每条命令将发送一条消息。

    • 发送消息 bin/kafka-console-producer.sh --broker-list localhost:9092 --topic dawang(然后可以随意输入内容,回车可以发送,ctrl+c 退出)

    5. 启动 consumer。可以读取消息并输出到标准输出:

    • 接收消息 bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic dawang --from-beginning
    • 在一个终端中运行 consumer 命令行,另一个终端中运行 producer 命令行,就可以在一个终端输入消息,另一个终端读取消息。这两个命令都有自己的可选参数,可以在运行的时候不加任何参数可以看到帮助信息。

    6. 搭建一个多个 broker 的集群,启动有 3 个 broker 组成的集群,这些 broker 节点也都在本机

    首先复制一下配置文件:cp config/server.properties config/server-1.properties 和 cp config/server.properties config/server-2.properties

    两个文件需要改动的内容为:

    config/server-1.properties:
    broker.id=1
    listeners=PLAINTEXT://:9093
    log.dir=/tmp/kafka-logs-1
     
    config/server-2.properties:
    broker.id=2
    listeners=PLAINTEXT://:9094
    log.dir=/tmp/kafka-logs-2

    这里我们把 broker id, 端口号和日志地址配置成和之前不一样,然后我们启动这两个 broker:

    bin/kafka-server-start.sh config/server-1.properties &
    bin/kafka-server-start.sh config/server-2.properties &
    然后创建一个复制因子为 3 的 topic

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic oh3topic

    可以使用 describe 命令来显示 topic 详情

    bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic oh3topic
    Topic:oh3topic PartitionCount:1 ReplicationFactor:3 Configs:
    Topic: oh3topic Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2

    我们也可以来看看之前的另一个 topic 的情况

    bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic dawang
    Topic:dawang PartitionCount:1 ReplicationFactor:1 Configs:
    Topic: dawang Partition: 0 Leader: 0 Replicas: 0 Isr: 0
    最后我们可以按照同样的方法来生产和消费消息,例如
    #生产
    bin/kafka-console-producer.sh --broker-list localhost:9092 --topic oh3topic
    # 消费
    bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic oh3topic

    开俩终端就可以一边生产消息,一边消费消息了。

    测试一下容错. 干掉leader,也就是Broker 1:

    ps -ef | grep server-1.properties

    Leader被切换到一个follower上节, 点 1 不会被列在isr中了,因为它死了:

    再次使用 describe 命令来显示 topic 详情

    bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic oh3topic

    但是,消息没丢啊,不信你试试:

    彻底删除kafka中的topic

    1、删除kafka存储目录(server.properties文件log.dirs配置,默认为"/tmp/kafka-logs")相关topic目录
    2、Kafka 删除topic的命令是:

    如果kafaka启动时加载的配置文件中server.properties没有配置delete.topic.enable=true,那么此时的删除并不是真正的删除,而是把topic标记为:marked for deletion

    bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic oh3topic

    3.代码实例

    需要自行安装librdkafka库

    https://github.com/edenhill/librdkafka

    produce

    #include <stdio.h>
    #include <stdlib.h>
    #include <iostream>
    #include <list>
    #include <memory>
    #include <string>
    #include <string.h>
    #include "librdkafka/rdkafkacpp.h"
    //#include "librdkafka/rdkafka.h"
    using namespace std;
    
    bool run = true;
    
    class ExampleDeliveryReportCb : public RdKafka::DeliveryReportCb
    {
     public:
      void dr_cb (RdKafka::Message &message) {
        std::cout << "Message delivery for (" << message.len() << " bytes): " <<
            message.errstr() << std::endl;
        if (message.key())
          std::cout << "Key: " << *(message.key()) << ";" << std::endl;
      }
    };
    
    
    class ExampleEventCb : public RdKafka::EventCb {
     public:
      void event_cb (RdKafka::Event &event) {
        switch (event.type())
        {
          case RdKafka::Event::EVENT_ERROR:
            std::cerr << "ERROR (" << RdKafka::err2str(event.err()) << "): " <<
                event.str() << std::endl;
            if (event.err() == RdKafka::ERR__ALL_BROKERS_DOWN)
              run = false;
            break;
    
          case RdKafka::Event::EVENT_STATS:
            std::cerr << ""STATS": " << event.str() << std::endl;
            break;
    
          case RdKafka::Event::EVENT_LOG:
            fprintf(stderr, "LOG-%i-%s: %s
    ",
                    event.severity(), event.fac().c_str(), event.str().c_str());
            break;
    
          default:
            std::cerr << "EVENT " << event.type() <<
                " (" << RdKafka::err2str(event.err()) << "): " <<
                event.str() << std::endl;
            break;
        }
      }
    };
    
    /* Use of this partitioner is pretty pointless since no key is provided * in the produce() call.so when you need input your key */
    class MyHashPartitionerCb : public RdKafka::PartitionerCb {
        public:
            int32_t partitioner_cb (const RdKafka::Topic *topic, const std::string *key,int32_t partition_cnt, void *msg_opaque)
            {
                std::cout<<"partition_cnt="<<partition_cnt<<std::endl;
                return djb_hash(key->c_str(), key->size()) % partition_cnt;
            }
        private:
            static inline unsigned int djb_hash (const char *str, size_t len)
            {
            unsigned int hash = 5381;
            for (size_t i = 0 ; i < len ; i++)
                hash = ((hash << 5) + hash) + str[i];
            std::cout<<"hash1="<<hash<<std::endl;
    
            return hash;
            }
    };
    
    void TestProducer()
    {
        std::string brokers = "localhost";
        std::string errstr;
        std::string topic_str="helloworld_kugou_test";//自行制定主题topic
        MyHashPartitionerCb hash_partitioner;
        int32_t partition = RdKafka::Topic::PARTITION_UA;
        int64_t start_offset = RdKafka::Topic::OFFSET_BEGINNING;
        bool do_conf_dump = false;
        int opt;
    
        int use_ccb = 0;
    
        //Create configuration objects
        RdKafka::Conf *conf = RdKafka::Conf::create(RdKafka::Conf::CONF_GLOBAL);
        RdKafka::Conf *tconf = RdKafka::Conf::create(RdKafka::Conf::CONF_TOPIC);
    
        if (tconf->set("partitioner_cb", &hash_partitioner, errstr) != RdKafka::Conf::CONF_OK)
         {
              std::cerr << errstr << std::endl;
              exit(1);
         }
    
        /* * Set configuration properties */
        conf->set("metadata.broker.list", brokers, errstr);
        ExampleEventCb ex_event_cb;
        conf->set("event_cb", &ex_event_cb, errstr);
    
        ExampleDeliveryReportCb ex_dr_cb;
    
        /* Set delivery report callback */
        conf->set("dr_cb", &ex_dr_cb, errstr);
    
        /* * Create producer using accumulated global configuration. */
        RdKafka::Producer *producer = RdKafka::Producer::create(conf, errstr);
        if (!producer)
        {
            std::cerr << "Failed to create producer: " << errstr << std::endl;
            exit(1);
        }
    
        std::cout << "% Created producer " << producer->name() << std::endl;
    
        /* * Create topic handle. */
        RdKafka::Topic *topic = RdKafka::Topic::create(producer, topic_str, tconf, errstr);
        if (!topic) {
          std::cerr << "Failed to create topic: " << errstr << std::endl;
          exit(1);
        }
    
        /* * Read messages from stdin and produce to broker. */
        for (std::string line; run && std::getline(std::cin, line);)
        {
            if (line.empty())
            {
                producer->poll(0);
                continue;
            }
    
          /* * Produce message // 1. topic // 2. partition // 3. flags // 4. payload // 5. payload len // 6. std::string key // 7. msg_opaque? NULL */
          std::string key=line.substr(0,5);//根据line前5个字符串作为key值
          // int a = MyHashPartitionerCb::djb_hash(key.c_str(),key.size());
          // std::cout<<"hash="<<a<<std::endl;
          RdKafka::ErrorCode resp = producer->produce(topic, partition,
              RdKafka::Producer::RK_MSG_COPY /* Copy payload */,
              const_cast<char *>(line.c_str()), line.size(),
              key.c_str(), key.size(), NULL);//这里可以设计key值,因为会根据key值放在对应的partition
            if (resp != RdKafka::ERR_NO_ERROR)
                std::cerr << "% Produce failed: " <<RdKafka::err2str(resp) << std::endl;
            else
                std::cerr << "% Produced message (" << line.size() << " bytes)" <<std::endl;
            producer->poll(0);//对于socket进行读写操作。poll方法才是做实际的IO操作的。return the number of events served
        }
        //
        run = true;
    
        while (run && producer->outq_len() > 0) {
          std::cerr << "Waiting for " << producer->outq_len() << std::endl;
          producer->poll(1000);
        }
    
        delete topic;
        delete producer;
    }
     
    int main(int argc, char *argv[])
    {
        TestProducer();
        return EXIT_SUCCESS;
    }

    consumer

    #include <stdio.h>
    #include <stdlib.h>
    #include <iostream>
    #include <list>
    #include <memory>
    #include <string>
    #include <string.h>
    #include "librdkafka/rdkafkacpp.h"
    using namespace std;
    
    bool run = true;
    bool exit_eof = true;
    class ExampleDeliveryReportCb : public RdKafka::DeliveryReportCb
    {
     public:
      void dr_cb (RdKafka::Message &message) {
        std::cout << "Message delivery for (" << message.len() << " bytes): " <<
            message.errstr() << std::endl;
        if (message.key())
          std::cout << "Key: " << *(message.key()) << ";" << std::endl;
      }
    };
    
    
    class ExampleEventCb : public RdKafka::EventCb {
     public:
      void event_cb (RdKafka::Event &event) {
        switch (event.type())
        {
          case RdKafka::Event::EVENT_ERROR:
            std::cerr << "ERROR (" << RdKafka::err2str(event.err()) << "): " <<
                event.str() << std::endl;
            if (event.err() == RdKafka::ERR__ALL_BROKERS_DOWN)
              run = false;
            break;
    
          case RdKafka::Event::EVENT_STATS:
            std::cerr << ""STATS": " << event.str() << std::endl;
            break;
    
          case RdKafka::Event::EVENT_LOG:
            fprintf(stderr, "LOG-%i-%s: %s
    ",
                    event.severity(), event.fac().c_str(), event.str().c_str());
            break;
    
          default:
            std::cerr << "EVENT " << event.type() <<
                " (" << RdKafka::err2str(event.err()) << "): " <<
                event.str() << std::endl;
            break;
        }
      }
    };
    
    /* Use of this partitioner is pretty pointless since no key is provided * in the produce() call.so when you need input your key */
    class MyHashPartitionerCb : public RdKafka::PartitionerCb {
        public:
            int32_t partitioner_cb (const RdKafka::Topic *topic, const std::string *key,int32_t partition_cnt, void *msg_opaque)
            {
                std::cout<<"partition_cnt="<<partition_cnt<<std::endl;
                return djb_hash(key->c_str(), key->size()) % partition_cnt;
            }
        private:
            static inline unsigned int djb_hash (const char *str, size_t len)
            {
            unsigned int hash = 5381;
            for (size_t i = 0 ; i < len ; i++)
                hash = ((hash << 5) + hash) + str[i];
            std::cout<<"hash1="<<hash<<std::endl;
    
            return hash;
            }
    };
    
    void msg_consume(RdKafka::Message* message, void* opaque)
    {
        switch (message->err())
        {
            case RdKafka::ERR__TIMED_OUT:
                break;
    
            case RdKafka::ERR_NO_ERROR:
              /* Real message */
                std::cout << "Read msg at offset " << message->offset() << std::endl;
                if (message->key())
                {
                    std::cout << "Key: " << *message->key() << std::endl;
                }
                printf("%.*s
    ", static_cast<int>(message->len()),static_cast<const char *>(message->payload()));
                break;
            case RdKafka::ERR__PARTITION_EOF:
                  /* Last message */
                  if (exit_eof)
                  {
                      run = false;
                      cout << "ERR__PARTITION_EOF" << endl;
                  }
                  break;
            case RdKafka::ERR__UNKNOWN_TOPIC:
            case RdKafka::ERR__UNKNOWN_PARTITION:
                std::cerr << "Consume failed: " << message->errstr() << std::endl;
                run = false;
                break;
        default:
            /* Errors */
            std::cerr << "Consume failed: " << message->errstr() << std::endl;
            run = false;
        }
    }
    class ExampleConsumeCb : public RdKafka::ConsumeCb {
        public:
            void consume_cb (RdKafka::Message &msg, void *opaque)
            {
                msg_consume(&msg, opaque);
            }
    };
    void TestConsumer()
    {
        std::string brokers = "localhost";
        std::string errstr;
        std::string topic_str="helloworld_kugou_test";//helloworld_kugou
        MyHashPartitionerCb hash_partitioner;
        int32_t partition = RdKafka::Topic::PARTITION_UA;//为何不能用??在Consumer这里只能写0???无法自动吗???
        partition = 0;
        int64_t start_offset = RdKafka::Topic::OFFSET_BEGINNING;
        bool do_conf_dump = false;
        int opt;
    
        int use_ccb = 0;
    
        //Create configuration objects
        RdKafka::Conf *conf = RdKafka::Conf::create(RdKafka::Conf::CONF_GLOBAL);
        RdKafka::Conf *tconf = RdKafka::Conf::create(RdKafka::Conf::CONF_TOPIC);
    
        if (tconf->set("partitioner_cb", &hash_partitioner, errstr) != RdKafka::Conf::CONF_OK)
        {
            std::cerr << errstr << std::endl;
            exit(1);
        }
    
        /* * Set configuration properties */
        conf->set("metadata.broker.list", brokers, errstr);
        ExampleEventCb ex_event_cb;
        conf->set("event_cb", &ex_event_cb, errstr);
    
        ExampleDeliveryReportCb ex_dr_cb;
    
        /* Set delivery report callback */
        conf->set("dr_cb", &ex_dr_cb, errstr);
        /* * Create consumer using accumulated global configuration. */
        RdKafka::Consumer *consumer = RdKafka::Consumer::create(conf, errstr);
        if (!consumer)
        {
          std::cerr << "Failed to create consumer: " << errstr << std::endl;
          exit(1);
        }
    
        std::cout << "% Created consumer " << consumer->name() << std::endl;
    
        /* * Create topic handle. */
        RdKafka::Topic *topic = RdKafka::Topic::create(consumer, topic_str, tconf, errstr);
        if (!topic)
        {
          std::cerr << "Failed to create topic: " << errstr << std::endl;
          exit(1);
        }
    
        /* * Start consumer for topic+partition at start offset */
        RdKafka::ErrorCode resp = consumer->start(topic, partition, start_offset);
        if (resp != RdKafka::ERR_NO_ERROR) {
          std::cerr << "Failed to start consumer: " << RdKafka::err2str(resp) << std::endl;
          exit(1);
        }
    
        ExampleConsumeCb ex_consume_cb;
    
        /* * Consume messages */
        while (run)
        {
            if (use_ccb)
            {
                consumer->consume_callback(topic, partition, 1000, &ex_consume_cb, &use_ccb);
          }
          else
          {
              RdKafka::Message *msg = consumer->consume(topic, partition, 1000);
              msg_consume(msg, NULL);
              delete msg;
          }
          consumer->poll(0);
        }
    
        /* * Stop consumer */
        consumer->stop(topic, partition);
    
        consumer->poll(1000);
    
        delete topic;
        delete consumer;
    }
     
    int main(int argc, char *argv[])
    {
        TestConsumer();
        return EXIT_SUCCESS;
    }
  • 相关阅读:
    Python数据分析与机器学习-Matplot_2
    Python数据分析与机器学习-Matplot_1
    1008. 数组元素循环右移问题 (20)
    Latex小技巧
    执行PowerShell脚本的时候出现"在此系 统上禁止运行脚本"错误
    Linux使用MentoHust联网线上校园网, 回到普通有线网络却连不上?
    Re:uxul
    Linux下nautilus的右键快捷菜单项设置
    从入门到入狱——搭讪技巧
    Latex命令
  • 原文地址:https://www.cnblogs.com/kaishan1990/p/7228683.html
Copyright © 2020-2023  润新知