Kafka初步学习 - 润新知

Kafka初步学习
一、官网教程案例学习

Kafka — 分布式消息队列

消息系统

消息中间件：缓冲于生产与消费中间

缓冲满了，可以进行Kafka的扩容

特性：

水平扩展性、容错性、实时、快

Kafka架构：

理解producer、consumer、broker（缓冲区）、topic（标签）

一个配置文件（server.properties）相当于一个broker

单节点（一台机器）的Kafka部署方法：

开启的时候记得创建多个控制台，方便分别在上面同时启动server（broker）、producer、consumer

1. 单broker部署：

准备工作：

先安装zookeeper，解压完后只需要更改conf目录下的zoo.cfg，改变dataDir不保存在tmp目录

ZK简单的使用，bin目录下的zkServer启动服务器，然后通过zkCli来连接

配置Kafka：

config目录下：

server.properties：

broker.id

listeners

host.name

…

启动：在KAFKA_HOME下

先启动ZK server

zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

再启动kafka server，启动时要加上config配置文件

kafka-server-start.sh $KAFKA_HOME/config/server.properties

创建topic：指定zookeeper端口

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看topic

kafka-topics.sh --list --zookeeper localhost:2181

查看topic详细信息

describe命令，可查看活的broker有哪个，leader是哪个等

发送消息（生产）：指定broker

kafka-console-producer.sh --broker-list localhost:9092 --topic test

注意：其中2181端口对应zookeeper server，而9092对应listener broker

消费消息：指定zk

kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

注意：带有beginning参数的话，会把历史所有的都一起读取

2. 多broker部署：

复制多个server-properties

更改其中的broker.id listeners log.dir

启动多个kafka server：

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties &

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties &

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties

-daemon在后台运行

&代表还有下几行

启动成功后jps中有三个kafka

创建多副本topic：

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-repli

发送和单broker一样，只不过改成多个端口

多broker的容错机制：

如果leader broker干掉了，就会选举新的，也就是干掉任意哪种broker都不会影响全局的使用

二、IDEA+Maven环境开发：

配置环境：

创建scala模版：

填信息：

修改setting路径：

创建完成scala project

修改pom.xml文件：

添加与删除dependency

kafka的版本：

<kafka.version>0.9.0.0</kafka.version>

<dependency>

    <groupId>org.apache.kafka</groupId>

    <artifactId>kafka_2.11</artifactId>

    <version>${kafka.version}</version>

</dependency>

</dependencies>

创建Java文件夹，并把它改成source属性（蓝色），在IDEA右上角改

三、用Java API来完成Kafka的Producer和Consumer的编程：

Producer：

首先定义Kafka中的常用变量类，brokerlist、ZK端口、topic名称
/* * Kafka配置文件, 用于定义producer, consumer * */ public class KafkaProperties { //定义端口号 public static final String ZK = "localhost:2181"; public static final String TOPIC = "hello_topic"; public static final String BROKER_LIST = "localhost:9092"; }
然后创建producer：
1. 定义全局变量topic，producer（选择kafka.javaapi.producer包）
2. 写构造函数，包括了：
3. 外部传入topic
4. 创建producer，需要传入ProducerConfig对象
5. PC对象需要传入一些参数，用properties类（java.util包）来传入
6. properties对象中需要为PC对象设置”metadata.broker.list" “serializer.class" "request.required.acks"
最后通过Thread线程run方法来启动producer发送信息

（本测试实现的每隔2s发送一个message）

实现代码：
import kafka.javaapi.producer.Producer; import kafka.producer.KeyedMessage; import kafka.producer.ProducerConfig; import java.util.Properties; /* * Kafka生产者 * */ public class KafkaProducer extends Thread{ private String topic; //选择kafka.javaapi.producer private Producer<Integer, String> producer; //构造方法，传入topic,生成producer public KafkaProducer(String topic) { this.topic = topic; //用properties设置ProducerConfig所需要的参数, 这是生成Producer的前提 //分别是broker_list, 序列化, 握手机制 Properties properties = new Properties(); properties.put("metadata.broker.list", KafkaProperties.BROKER_LIST); properties.put("serializer.class", "kafka.serializer.StringEncoder"); //此处序列化类用String properties.put("request.required.acks", "1"); //可设置为0, 1, -1, 一般生产用1, 最严谨是-1, 不能用0 producer = new Producer<Integer, String>(new ProducerConfig(properties)); } //用线程来启动producer @Override public void run() { int messageNo = 1; while(true) { String message = "massage_" + messageNo; producer.send(new KeyedMessage<Integer, String>(topic, message)); System.out.println("send: " + message); messageNo++; //2s间隔发送一次 try { Thread.sleep(2000); } catch(Exception e) { e.printStackTrace(); } } } }
Consumer：

创建过程：
1. 构造方法中传入topic
2. 创建createConnector方法，返回值是一个ConsumerConnector，注意不直接是Consumer
3. 按照producer一样的方法，往ConsumerConnector中传入所需要的属性zookeeper.connect group.id
执行过程：通过Thread的run方法改写：
1. 为了创建messageStream，先创建一个Map，装topic和kafka stream的数量
2. 创建messageStream，并获取每次的数据
3. 对messageStream进行迭代，获取消息
实现代码：
import kafka.consumer.Consumer; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; import kafka.javaapi.consumer.ConsumerConnector; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import static com.sun.org.apache.xml.internal.security.keys.keyresolver.KeyResolver.iterator; import static javafx.scene.input.KeyCode.V; /* * Kafka消费者 * */ public class KafkaConsumer extends Thread { private String topic; public KafkaConsumer(String topic) { this.topic = topic; } //ConsumerConnector选择kafka.javaapi.consumer包 //此处是要创建consumer连接器, 而不是创建consumer, 区别于producer private ConsumerConnector createConnector() { //同样地设置ConsumerConfig对象的属性 //需要设置ZK Properties properties = new Properties(); properties.put("zookeeper.connect", KafkaProperties.ZK); properties.put("group.id", KafkaProperties.GROUP_ID); return Consumer.createJavaConsumerConnector(new ConsumerConfig(properties)); } //线程启动consumer @Override public void run() { ConsumerConnector consumer = createConnector(); //由于createMessageStreams需要传入一个Map, 所以创建一个 Map<String, Integer> topicCountMap = new HashMap<String, Integer>(); //map中放入topic和kafka stream的数量 topicCountMap.put(topic, 1); //创建messageStream, 从源码中可看出它的数据类型 //String是topic, List是数据比特流 Map<String, List<KafkaStream<byte[], byte[]>>> messageStream = consumer.createMessageStreams(topicCountMap); //获取每次的数据 KafkaStream<byte[], byte[]> byteStream = messageStream.get(topic).get(0); //数据流进行迭代 ConsumerIterator<byte[], byte[]> iterator = byteStream.iterator(); while (iterator.hasNext()) { //由于iterator里面的是byte类型，要转为String String message = new String(iterator.next().message()); System.out.println("receive:" + message); } } }
四、Kafka简易实战

整合Flume和Kafka完成实时数据采集

Kafka sink作为producer连接起来

技术选型：

Agent1: exec source -> memory channel -> avro sink

Agent2: avro source -> memory channel -> kafka sink(producer)

producer -> consumer

配置exec-memory-avro：

exec-memory-avro.sources = exec-source

exec-memory-avro.sinks = avro-sink

exec-memory-avro.channels = memory-channel

# Describe/configure the source

exec-memory-avro.sources.exec-source.type = exec

exec-memory-avro.sources.exec-source.command = tail -F /usr/local/mycode/data/data.log

exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe the sink

exec-memory-avro.sinks.avro-sink.type = avro

exec-memory-avro.sinks.avro-sink.hostname = localhost

exec-memory-avro.sinks.avro-sink.port = 44444

# Use a channel which buffers events in memory

exec-memory-avro.channels.memory-channel.type = memory

# Bind the source and sink to the channel

exec-memory-avro.sources.exec-source.channels = memory-channel

exec-memory-avro.sinks.avro-sink.channel = memory-channel

配置avro-memory-kafka：

avro-memory-kafka.sources = avro-source

avro-memory-kafka.sinks = kafka-sink

avro-memory-kafka.channels = memory-channel

# Describe/configure the source

avro-memory-kafka.sources.avro-source.type = avro

avro-memory-kafka.sources.avro-source.bind = localhost

avro-memory-kafka.sources.avro-source.port = 44444

# Describe the sink

avro-memory-kafka.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink

avro-memory-kafka.sinks.kafka-sink.kafka.bootstrap.servers = localhost:9092

avro-memory-kafka.sinks.kafka-sink.kafka.topic = hello_topic

avro-memory-kafka.sinks.kafka-sink.kafka.flumeBatchSize = 5

avro-memory-kafka.sinks.kafka-sink.kafka.kafka.producer.acks = 1

# Use a channel which buffers events in memory

avro-memory-kafka.channels.memory-channel.type = memory

# Bind the source and sink to the channel

avro-memory-kafka.sources.avro-source.channels = memory-channel

avro-memory-kafka.sinks.kafka-sink.channel = memory-channel

启动两个flume agent：（注意先后顺序）

flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro-memory-kafka.conf --name avro-memory-kafka -Dflume.root.logger=INFO,console

flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-avro.conf --name exec-memory-avro -Dflume.root.logger=INFO,console

启动kafka consumer：

kafka-console-consumer.sh --zookeeper localhost:2181 --topic hello_topic

执行过程比较慢！要等一下 concumer的控制台才有数据显示
相关阅读:
MBR记录
 execute，executeQuery，executeUpdate的区别是什么？
JDBC访问数据库的基本步骤是什么？
什么是JDBC，在上面时候会用到它？
int 和 Integer 有什么区别
 String和StringBuffer、StringBuilder的区别是什么？String为什么是不可变的
 final finally finalize区别
 Java有哪些基本数据类型
 面向过程和面向对象的区别
 JDK,JRE,JVM三者关系
原文地址：https://www.cnblogs.com/kinghey-java-ljx/p/8544255.html