第一步:数据源
手写程序实现自动生成如下格式的日志文件:
15837312345,13737312345,2017-01-09 08:09:10,0360
打包放到服务器,使用如下命令执行,模拟持续不断的日志文件:
java -cp ct_producter-1.0-SNAPSHOT.jar producter.ProductLog ./awen.tsv
第二步:监听log.tsv日志
使用Flume监控滚动的awen.tsv日志,编写flume
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/datas/awen.tsv
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic=flume01
a1.sinks.k1.brokerList = hadoop1:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
执行如下命令进行监控awen.tsv日志文件的滚动
bin/flume-ng agent --conf conf/ --name a1 --conf-file /home/hadoop/datas/tsv-flume-kafka/flume-kafka.conf
第三步:消费topic数据
bin/kafka-console-consumer.sh --zookeeper hadoop1:2181 --topic flume01 --consumer.config config/consumer.properties