• 二十二、Hadoop学记笔记————Kafka 基础实战 :消费者和生产者实例


     kafka的客户端也支持其他语言,这里主要介绍python和java的实现,这两门语言比较主流和热门

    图中有四个分区,每个图形对应一个consumer,任意一对一即可

    获取topic的分区数,每个分区创建一个进程消费分区中的数据。

    每个进程的实例中,先要创建连接kafka的实例,然后指定连接到哪个topic(主图),哪个分区

    之后要设置kafka的偏移量,kafka中每条消息都有偏移量,如果消费者突然宕机了,则可以从上个偏移量继续消费

    提交偏移量的工作客户端都会默认操作,因此提交偏移量可选

    后续会根据伪代码描述编写程序

    GroupA和GourpB都能拿到当前topic的全部数据,组消费可以复制消费,即kafka会复制消息分别发送给组A和组B

    流数N指代每个Gourp中有都少个consumer,上图中A有2个流,B有4个流

    每个consumer实力也需要创建连接kafka的实例,设置连接到哪个topic和分区

    也可以设置偏移量,与分区消费一样

    按组消费可以选择从头消费还是从最新消费

    PT代表topic T下的所有分区,CG代表Group中有多少个consumer实例

    排序分区parition,排序consumer

    对于前面的例子GourpA,就是PT=4,CG=2,所以N等于2

    分区模式中,所有生产者也默认至少发送一次消息,但是可以自定义发送一次接受一次,或者只发送一次不管是否接收

    kafka版本与服务器一致即可

     

    pom文件如下

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.jike.kafkatest</groupId>
      <artifactId>JikeKafka</artifactId>
      <version>1.0</version>
      <packaging>jar</packaging>
    
      <name>JikeKafka</name>
      <url>http://maven.apache.org</url>
    
      <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      </properties>
    
      <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>org.apache.kafka</groupId>
          <artifactId>kafka_2.9.2</artifactId>
          <version>0.8.1.1</version>
          <exclusions>
           <exclusion>
            <artifactId>jmxri</artifactId>
            <groupId>com.sun.jmx</groupId>
           </exclusion>
           <exclusion>
            <artifactId>jms</artifactId>
            <groupId>javax.jms</groupId>
           </exclusion>
           <exclusion>
            <artifactId>jmxtools</artifactId>
            <groupId>com.sun.jdmk</groupId>
           </exclusion>
          </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-ipc</artifactId>
            <version>1.7.3</version>
        </dependency>
      </dependencies>
      <build>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <plugins>
          <!--
            Bind the maven-assembly-plugin to the package phase
            this will create a jar file without the storm dependencies
            suitable for deployment to a cluster.
           -->
          <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
              <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
              </descriptorRefs>
              <archive>
                <manifest>
                  <mainClass></mainClass>
                </manifest>
              </archive>
            </configuration>
            <executions>
              <execution>
                <id>make-assembly</id>
                <phase>package</phase>
                <goals>
                  <goal>single</goal>
                </goals>
              </execution>
            </executions>
        </plugin>  
        </plugins>
      </build> 
    </project>

    分组模式下Java代码如下:

    package kafka.consumer.group;
    
    import kafka.consumer.ConsumerIterator;
    import kafka.consumer.KafkaStream;
     
    public class ConsumerTest implements Runnable {
        private KafkaStream m_stream;
        private int m_threadNumber;
     
        public ConsumerTest(KafkaStream a_stream, int a_threadNumber) {
            m_threadNumber = a_threadNumber;
            m_stream = a_stream;
        }
     
        public void run() {
            ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
            while (it.hasNext()){
                System.out.println("Thread " + m_threadNumber + ": " + new String(it.next().message()));
                
            }
            System.out.println("Shutting down Thread: " + m_threadNumber);
        }
    }
    package kafka.consumer.group;
    
    import kafka.consumer.ConsumerConfig;
    import kafka.consumer.KafkaStream;
    import kafka.javaapi.consumer.ConsumerConnector;
     
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.Properties;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    
    public class GroupConsumerTest extends Thread {
        private final ConsumerConnector consumer;
        private final String topic;
        private  ExecutorService executor;
        
        public GroupConsumerTest(String a_zookeeper, String a_groupId, String a_topic){
            consumer = kafka.consumer.Consumer.createJavaConsumerConnector(
                    createConsumerConfig(a_zookeeper, a_groupId));
            this.topic = a_topic;
        }
        
        public void shutdown() {
            if (consumer != null) consumer.shutdown();
            if (executor != null) executor.shutdown();
            try {
                if (!executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS)) {
                    System.out.println("Timed out waiting for consumer threads to shut down, exiting uncleanly");
                }
            } catch (InterruptedException e) {
                System.out.println("Interrupted during shutdown, exiting uncleanly");
            }
       }
     
        public void run(int a_numThreads) {
            Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
            topicCountMap.put(topic, new Integer(a_numThreads));
            Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
            List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
     
            // now launch all the threads
            //
            executor = Executors.newFixedThreadPool(a_numThreads);
     
            // now create an object to consume the messages
            //
            int threadNumber = 0;
            for (final KafkaStream stream : streams) {
                executor.submit(new ConsumerTest(stream, threadNumber));
                threadNumber++;
            }
        }
        private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {
            Properties props = new Properties();
            props.put("zookeeper.connect", a_zookeeper);
            props.put("group.id", a_groupId);
            props.put("zookeeper.session.timeout.ms", "40000");
            props.put("zookeeper.sync.time.ms", "2000");
            props.put("auto.commit.interval.ms", "1000");
     
            return new ConsumerConfig(props);
        }
        
        public static void main(String[] args) {
            if(args.length < 1){
                System.out.println("Please assign partition number.");
            }
            
            String zooKeeper = "10.206.216.13:12181,10.206.212.14:12181,10.206.209.25:12181";
            String groupId = "jikegrouptest";
            String topic = "jiketest";
            int threads = Integer.parseInt(args[0]);
     
            GroupConsumerTest example = new GroupConsumerTest(zooKeeper, groupId, topic);
            example.run(threads);
     
            try {
                Thread.sleep(Long.MAX_VALUE);
            } catch (InterruptedException ie) {
     
            }
            example.shutdown();
        }
    }

    分区模式下Java代码如下:

    package kafka.consumer.partition;
    
    import kafka.api.FetchRequest;
    import kafka.api.FetchRequestBuilder;
    import kafka.api.PartitionOffsetRequestInfo;
    import kafka.common.ErrorMapping;
    import kafka.common.TopicAndPartition;
    import kafka.javaapi.*;
    import kafka.javaapi.consumer.SimpleConsumer;
    import kafka.message.MessageAndOffset;
     
    import java.nio.ByteBuffer;
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    public class PartitionConsumerTest {
        public static void main(String args[]) {
            PartitionConsumerTest example = new PartitionConsumerTest();
            long maxReads = Long.MAX_VALUE;
            String topic = "jiketest";
            if(args.length < 1){
                System.out.println("Please assign partition number.");
            }
            
            List<String> seeds = new ArrayList<String>();
            String hosts="10.206.216.13,10.206.212.14,10.206.209.25";
            String[] hostArr = hosts.split(",");
            for(int index = 0;index < hostArr.length;index++){
                seeds.add(hostArr[index].trim());
            }
            
            int port = 19092;
             
            int partLen = Integer.parseInt(args[0]);
            for(int index=0;index < partLen;index++){
                try {
                    example.run(maxReads, topic, index/*partition*/, seeds, port);
                } catch (Exception e) {
                    System.out.println("Oops:" + e);
                     e.printStackTrace();
                }
            }
        }
        
        private List<String> m_replicaBrokers = new ArrayList<String>();
         
            public PartitionConsumerTest() {
                m_replicaBrokers = new ArrayList<String>();
            }
         
            public void run(long a_maxReads, String a_topic, int a_partition, List<String> a_seedBrokers, int a_port) throws Exception {
                // find the meta data about the topic and partition we are interested in
                //
                PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition);
                if (metadata == null) {
                    System.out.println("Can't find metadata for Topic and Partition. Exiting");
                    return;
                }
                if (metadata.leader() == null) {
                    System.out.println("Can't find Leader for Topic and Partition. Exiting");
                    return;
                }
                String leadBroker = metadata.leader().host();
                String clientName = "Client_" + a_topic + "_" + a_partition;
         
                SimpleConsumer consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
                long readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName);
         
                int numErrors = 0;
                while (a_maxReads > 0) {
                    if (consumer == null) {
                        consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
                    }
                    FetchRequest req = new FetchRequestBuilder()
                            .clientId(clientName)
                            .addFetch(a_topic, a_partition, readOffset, 100000) // Note: this fetchSize of 100000 might need to be increased if large batches are written to Kafka
                            .build();
                    FetchResponse fetchResponse = consumer.fetch(req);
         
                    if (fetchResponse.hasError()) {
                        numErrors++;
                        // Something went wrong!
                        short code = fetchResponse.errorCode(a_topic, a_partition);
                        System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);
                        if (numErrors > 5) break;
                        if (code == ErrorMapping.OffsetOutOfRangeCode())  {
                            // We asked for an invalid offset. For simple case ask for the last element to reset
                            readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
                            continue;
                        }
                        consumer.close();
                        consumer = null;
                        leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);
                        continue;
                    }
                    numErrors = 0;
         
                    long numRead = 0;
                    for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
                        long currentOffset = messageAndOffset.offset();
                        if (currentOffset < readOffset) {
                            System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset);
                            continue;
                        }
                        readOffset = messageAndOffset.nextOffset();
                        ByteBuffer payload = messageAndOffset.message().payload();
         
                        byte[] bytes = new byte[payload.limit()];
                        payload.get(bytes);
                        System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
                        numRead++;
                        a_maxReads--;
                    }
         
                    if (numRead == 0) {
                        try {
                            Thread.sleep(1000);
                        } catch (InterruptedException ie) {
                        }
                    }
                }
                if (consumer != null) consumer.close();
            }
         
            public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
                                             long whichTime, String clientName) {
                TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
                Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
                requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
                kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(
                        requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);
                OffsetResponse response = consumer.getOffsetsBefore(request);
         
                if (response.hasError()) {
                    System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
                    return 0;
                }
                long[] offsets = response.offsets(topic, partition);
                return offsets[0];
            }
         
            private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {
                for (int i = 0; i < 3; i++) {
                    boolean goToSleep = false;
                    PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
                    if (metadata == null) {
                        goToSleep = true;
                    } else if (metadata.leader() == null) {
                        goToSleep = true;
                    } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
                        // first time through if the leader hasn't changed give ZooKeeper a second to recover
                        // second time, assume the broker did recover before failover, or it was a non-Broker issue
                        //
                        goToSleep = true;
                    } else {
                        return metadata.leader().host();
                    }
                    if (goToSleep) {
                        try {
                            Thread.sleep(1000);
                        } catch (InterruptedException ie) {
                        }
                    }
                }
                System.out.println("Unable to find new leader after Broker failure. Exiting");
                throw new Exception("Unable to find new leader after Broker failure. Exiting");
            }
         
            private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {
                PartitionMetadata returnMetaData = null;
                loop:
                for (String seed : a_seedBrokers) {
                    SimpleConsumer consumer = null;
                    try {
                        consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");
                        List<String> topics = Collections.singletonList(a_topic);
                        TopicMetadataRequest req = new TopicMetadataRequest(topics);
                        kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);
         
                        List<TopicMetadata> metaData = resp.topicsMetadata();
                        for (TopicMetadata item : metaData) {
                            for (PartitionMetadata part : item.partitionsMetadata()) {
                                if (part.partitionId() == a_partition) {
                                    returnMetaData = part;
                                    break loop;
                                }
                            }
                        }
                    } catch (Exception e) {
                        System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic
                                + ", " + a_partition + "] Reason: " + e);
                    } finally {
                        if (consumer != null) consumer.close();
                    }
                }
                if (returnMetaData != null) {
                    m_replicaBrokers.clear();
                    for (kafka.cluster.Broker replica : returnMetaData.replicas()) {
                        m_replicaBrokers.add(replica.host());
                    }
                }
                return returnMetaData;
            }
    }

    参数调优

     接下来实现生产者,能够像kafka中传递消息

    生产者发送消息后会不断确认kafka集群是否收到,如果没收到就会重发,如果达到最大次数就会结束生产

    异步生产的时候,消息会事先缓存在客户端,可以设置最大消息缓存数或者累计缓存时间,如果达到设置的标准,就会打包发送给kafka服务器

    两种模型伪代码描述非常相似,所以用一个就能表示

    先创建链接实例,之后配置负载均衡

    在设置生产者参数的时候就会定义是同步还是异步

     

    同步模型由于需要同步,所以丢失率基本为0

    异步模型中,每个分区每秒可以发送50万条消息

    接下来实现Java客户端程序编写:

    pom文件与上述pom文件一样

    同步模型代码如下:

    package kafka.producer.sync;
    import java.util.*;
    
    import kafka.javaapi.producer.Producer;
    import kafka.producer.KeyedMessage;
    import kafka.producer.ProducerConfig;
    
    public class SyncProduce {
        public static void main(String[] args) {
            long events = Long.MAX_VALUE;
            Random rnd = new Random();
     
            Properties props = new Properties();
            props.put("metadata.broker.list", "10.206.216.13:19092,10.206.212.14:19092,10.206.209.25:19092");
            props.put("serializer.class", "kafka.serializer.StringEncoder");
            //kafka.serializer.DefaultEncoder
            props.put("partitioner.class", "kafka.producer.partiton.SimplePartitioner");
            //kafka.producer.DefaultPartitioner: based on the hash of the key
            props.put("request.required.acks", "1");
            //0;  绝不等确认  1:   leader的一个副本收到这条消息,并发回确认 -1:   leader的所有副本都收到这条消息,并发回确认
     
            ProducerConfig config = new ProducerConfig(props);
     
            Producer<String, String> producer = new Producer<String, String>(config);
     
            for (long nEvents = 0; nEvents < events; nEvents++) { 
                   long runtime = new Date().getTime();  
                   String ip = "192.168.2." + rnd.nextInt(255); 
                   String msg = runtime + ",www.example.com," + ip; 
                   //eventKey必须有(即使自己的分区算法不会用到这个key,也不能设为null或者""),否者自己的分区算法根本得不到调用
                   KeyedMessage<String, String> data = new KeyedMessage<String, String>("jiketest", ip, msg);
                                                                   //             eventTopic, eventKey, eventBody
                   producer.send(data);
                   try {
                       Thread.sleep(1000);
                   } catch (InterruptedException ie) {
                   }
            }
            producer.close();
        }
    }

    异步模型代码如下:

    package kafka.producer.async;
    
    import java.util.*;
    
    import kafka.javaapi.producer.Producer;
    import kafka.producer.KeyedMessage;
    import kafka.producer.ProducerConfig;
    
    
    public class ASyncProduce {
        public static void main(String[] args) {
            long events = Long.MAX_VALUE;
            Random rnd = new Random();
     
            Properties props = new Properties();
            props.put("metadata.broker.list", "10.206.216.13:19092,10.206.212.14:19092,10.206.209.25:19092");
            props.put("serializer.class", "kafka.serializer.StringEncoder");
            //kafka.serializer.DefaultEncoder
            props.put("partitioner.class", "kafka.producer.partiton.SimplePartitioner");
            //kafka.producer.DefaultPartitioner: based on the hash of the key
            //props.put("request.required.acks", "1");
            props.put("producer.type", "async");
            //props.put("producer.type", "1");
            // 1: async 2: sync
     
            ProducerConfig config = new ProducerConfig(props);
     
            Producer<String, String> producer = new Producer<String, String>(config);
     
            for (long nEvents = 0; nEvents < events; nEvents++) { 
                   long runtime = new Date().getTime();  
                   String ip = "192.168.2." + rnd.nextInt(255); 
                   String msg = runtime + ",www.example.com," + ip; 
                   KeyedMessage<String, String> data = new KeyedMessage<String, String>("jiketest", ip, msg);
                   producer.send(data);
                   try {
                       Thread.sleep(1000);
                   } catch (InterruptedException ie) {
                   }
            }
            producer.close();
        }
    }

    分区算法:

    package kafka.producer.partiton;
    
    import kafka.producer.Partitioner;
    import kafka.utils.VerifiableProperties;
     
    public class SimplePartitioner implements Partitioner {
        public SimplePartitioner (VerifiableProperties props) {
     
        }
     
        public int partition(Object key, int a_numPartitions) {
            int partition = 0;
            String stringKey = (String) key;
            int offset = stringKey.lastIndexOf('.');
            if (offset > 0) {
               partition = Integer.parseInt( stringKey.substring(offset+1)) % a_numPartitions;
            }
           return partition;
      }
     
    }

    人生苦短,远离IT脱离苦海
  • 相关阅读:
    CAsyncSocket网络编程(MFC)
    CSDN回帖得分大全(近两年)
    VC:使用Windows Socket开发应用程序
    MFC对Socket编程的支持
    计算机操作系统
    计算机基础
    计算机发展历史
    iOS开发之国际化
    iOS开发之iOS程序偏好设置(Settings Bundle)的使用
    iOS中使用RegexKitLite来试用正则表达式 使用ARC 20个错误解决办法
  • 原文地址:https://www.cnblogs.com/liuxiaopang/p/8065951.html
Copyright © 2020-2023  润新知