• Flume的断点续传解决


    根据需求,首先定义以下3大要素
    采集源,即source——监控文件内容更新 :  exec  ‘tail -F file’
    下沉目标,即sink——HDFS文件系统  :  hdfs sink
    Source和sink之间的传递通道——channel,可用file channel 也可以用 内存channel
     
    agent1.sources = source1
    agent1.sinks = sin k1
    agent1.channels = channel1
     
    # Describe/configure tail -F source1
    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -f /root/flumedata/logs/text.txt
    agent1.sources.source1.channels = channel1
     
    #configure host for source
    agent1.sources.source1.interceptors = i1
    agent1.sources.source1.interceptors.i1.type = host
    agent1.sources.source1.interceptors.i1.hostHeader = hostname
     
    # Describe sink1
    agent1.sinks.sink1.type = hdfs
    #a1.sinks.k1.channel = c1
    agent1.sinks.sink1.hdfs.path =hdfs://hadoop01:9000/weblog/flume-collection/%y-%m-%d/%H-%M
    agent1.sinks.sink1.hdfs.filePrefix = access_log
    agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
    agent1.sinks.sink1.hdfs.batchSize= 10
    agent1.sinks.sink1.hdfs.fileType = DataStream
    agent1.sinks.sink1.hdfs.writeFormat =Text
    agent1.sinks.sink1.hdfs.rollSize = 10
    agent1.sinks.sink1.hdfs.rollCount = 100
    agent1.sinks.sink1.hdfs.rollInterval = 6
    agent1.sinks.sink1.hdfs.round = true
    agent1.sinks.sink1.hdfs.roundValue = 1
    agent1.sinks.sink1.hdfs.roundUnit = minute
    agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
     
    # Use a channel which buffers events in memory
    agent1.channels.channel1.type = memory
    agent1.channels.channel1.keep-alive = 120
    agent1.channels.channel1.capacity = 500000
    agent1.channels.channel1.transactionCapacity = 600
     
    # Bind the source and sink to the channel
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1
     
    然后往:/root/flumedata/logs/text.txt 这个文件中追加日期
    while true
    do
    date >> /root/flumedata/logs/text.txt
    done
     
    tail -f 和 tail -F的区别:
    tail -f 当文件变了,不会再输出
    tail -F当文件变了,还会再输出 
     
    所以,我们可以利用tail -F实现断点续传的功能:
    a1.sources.r2.command=
    tail  -n +$(tail -n1 /root/log) -F /root/data/nginx.log | awk 'ARGIND==1{i=$0;next}{i++;if($0~/^tail/){i=0};print $0;print i >> "/root/log";fflush("")}' /root/log- 
     
    如果有多个source,那必须要配置多个:a1.sources.r2.command
     

  • 相关阅读:
    应用部署架构演进【转载】
    TiDB 学习笔记一(运维管理)
    c++ strcmp函数
    C++ sort()函数
    C++ 遍历set的三种方式
    nvcc fatal : '--ptxas-options=-v': expected a number
    PAT A1039 Vector的使用
    C++ set
    C++ int与string互转换
    C++%f和%lf的区别
  • 原文地址:https://www.cnblogs.com/niutao/p/10801326.html
Copyright © 2020-2023  润新知