• flume 读目录下文件 ,同步到kafka


    启动和配置flume

    Agent 是一个 JVM 进程,它以事件(Event)的形式将数据从源头(Source)通过渠道(Channel)送至目标端(Sink)。

    Agent 主要有 3 个部分组成,Source、Channel、Sink。

    # 目录->kafka
    a1.sources = s1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.s1.channels = c1
    a1.sinks.k1.channel = k1
    
    a1.sources.s1.type = exec
    a1.sources.s1.command = tail -F /home/yu/access.log
    
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.topic = flumnTopic
    a1.sinks.k1.kafka.bootstrap.servers = master:2181
    a1.sinks.k1.kafka.flumeBatchSize = 20
    a1.sinks.k1.kafka.produce.ack = 1
    a1.sinks.k1.kafka.produce.linger.ms = 1
    a1.sinks.k1.kafka.produce.compression.type = snapp
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transcationCapacity = 100
    
    

    0.依赖 jdk

    1.安装flumn

    2.配置环境变量

    vim /etc/profile.d/flume.sh

    FLUMN_HOME=/opt/flumn
    PATH=$PATH:FLUMN_HOME/bin
    

    source profile

    3.配置flume

    • cp flume-env.sh.template flume-env.sh
    export JAVA_HOME=/opt/jdk1.8.0_191
    
    • cp flume-conf.properties.template flume-conf.properties

    source 使用 spoolingDirectory(spooldir)(假脱机目录)

    channel 使用memory

    sink使用kafka

    #1.agent a 
    a.sources=aSource
    a.channels=aChannel
    a.sinks=aSink
    
    # 2.连接
    a.sources.aSource.channels=aChannel
    a.sinks.aSink.channel=aChannel
    
    # 3.source 
    #类型 目录文件
    a.sources.aSource.type=spooldir
    a.sources.aSource.spoolDir=/opt/apache-zookeeper-3.7.0-bin/logs
    # 反序列化器
    a.sources.aSource.deserializer=LINE
    a.sources.aSource.deserializer.maxLineLength=320000
    # 文件正则匹配格式
    a.sources.aSource.includePattern= zookeeper-root-server-app.out
    # 拦截器
    a.sources.aSource.interceptors=head_filter
    a.sources.aSource.interceptors.head_filter.type=regex_filter
    a.sources.aSource.interceptors.head_filter.regex=^user*
    a.sources.aSource.interceptors.head_filter.excludeEvents=true
    
    #4.channel
    #a.channels.aChannel.type=file
    #a.channels.aChannel.checkpointDir=/opt/kb15tmp/checkpoint/a
    #a.channels.aChannel.dataDirs=/opt/kb15tmp/checkpoint/data/a
    a.channels.aChannel.type=memory
    a.channels.aChannel.capacity=100000
    a.channels.aChannel.transactionCapacity=10000
    
    #5.sink
    a.sinks.aSink.type=org.apache.flume.sink.kafka.KafkaSink
    a.sinks.aSink.batchSize=640
    a.sinks.aSink.brokerList=app:9092
    a.sinks.aSink.topic=topic01
    
    
    

    4.启动flume

    # --name a  agent的name
    bin/flume-ng agent --conf conf/ --name a --conf-file conf/flume-conf.properties
    
  • 相关阅读:
    linux 中输出匹配行的下一行
    linux中sed命令删除匹配行及其下一行
    linux中常见的文件类型
    linux中grep命令i匹配以制表符开头的行
    linux中输出匹配行及其后的若干行
    linux中如何删除文本开头的多个空格和tab键
    linux中删除匹配行及其后的若干行
    普通用户修改个人密码:sudo : is not in the sudoers file. This incident will be reported.
    一坨iBatis 的代码。
    ubuntu误删除Desktop文件夹,导致桌面默认路径更改
  • 原文地址:https://www.cnblogs.com/mznsndy/p/16369177.html
Copyright © 2020-2023  润新知