• 使用Flume往kafka和hdfs里同时写数据


    环境背景

    组件名称 组件版本 百度网盘地址
    Flume   flume-ng-1.6.0-cdh5.7.0.tar.gz 链接:https://pan.baidu.com/s/11QeF7rk2rqnOrFankr4TzA 提取码:3ojw
    Zookeeper  Zookeeper-3.4.5 链接:https://pan.baidu.com/s/1upNcB53WGWP_89lhYnqP6g 提取码:j50f
    Kafka kafka_2.11-0.10.0.0.tgz  链接:https://pan.baidu.com/s/1TpU6QPnoF1tuUy-7HnGgmQ 提取码:aapj

      

    Zookeeper部署   参照第4部

    flume的部署

    • kafka部署

    #解压
    [hadoop@hadoop001 soft]$ cd ~/soft
    [hadoop@hadoop001 soft]$ tar -zxvf kafka_2.11-0.10.0.0.tgz -C ~/app/
    
    #修改数据存储位置
    [hadoop@hadoop001 soft]$ cd ~/app/kafka_2.11-0.10.0.0/
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ mkdir -p ~/app/kafka_2.11-0.10.0.0/datalogdir
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ vim config/server.properties 
    log.dirs=/home/hadoop/app/kafka_2.11-0.10.0.0/datalogdir
    
    #添加环境变量
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ vim ~/.bash_profile 
    export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.10.0.0
    export PATH=$KAFKA_HOME/bin:$PATH
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ source ~/.bash_profile 
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ which kafka-topics.sh
    ~/app/kafka-0.10.1.1/bin/kafka-topics.sh
    
    #启动
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-server-start.sh config/server.properties 
    
    #测试:创建Topic
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wsk_test
    #测试:显示Topic列表
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
    #测试:控制台生产者
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wsk_test
    #测试:控制台消费者
    [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wsk_test --from-beginning 
    • 配置Flume作业

    使用Flume的TailDir Source采集数据发送到Kafka以及HDFS。具体配置如下:

    Taildir-HdfsAndKafka-Agnet.sources = taildir-source   
    Taildir-HdfsAndKafka-Agnet.channels = c1 c2
    Taildir-HdfsAndKafka-Agnet.sinks = hdfs-sink kafka-sink
    
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.type = TAILDIR
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups = f1
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/HdfsAndKafka/input/.*
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.positionFile = /home/hadoop/data/flume/HdfsAndKafka/taildir_position/taildir_position.json
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.selector.type = replicating
    
    Taildir-HdfsAndKafka-Agnet.channels.c1.type = memory
    Taildir-HdfsAndKafka-Agnet.channels.c2.type = memory
    
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.type = hdfs
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/HdfsAndKafka/%Y%m%d%H%M
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.useLocalTimeStamp=true
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.filePrefix = wsktest-
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollInterval = 10
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollSize = 100000000
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollCount = 0
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.fileType=DataStream
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.writeFormat=Text
    
    Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
    Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.brokerList = localhost:9092
    Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.topic = wsk_test
    
    
    Taildir-HdfsAndKafka-Agnet.sources.taildir-source.channels = c1 c2
    Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.channel = c1
    Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.channel = c2
    • 启动命令

    flume-ng agent 
    --name Taildir-HdfsAndKafka-Agnet 
    --conf $FLUME_HOME/conf 
    --conf-file $FLUME_HOME/conf/Taildir-HdfsAndKafka-Agnet.conf 
    -Dflume.root.logger=INFO,console
  • 相关阅读:
    写在开始前表单数据校验
    Redis之时间轮机制(五)
    Reids高性能原理(六)
    redis持久化之RDB (七)
    Redis的使用(二)
    redisson之分布式锁实现原理(三)
    redis主从复制(九)
    Redis的内存淘汰策略(八)
    Redis之Lua的应用(四)
    《长安的荔枝》读书笔记
  • 原文地址:https://www.cnblogs.com/xuziyu/p/11115421.html
Copyright © 2020-2023  润新知