• Flume环境搭建_五种案例(转)


    Flume环境搭建_五种案例

    http://flume.apache.org/FlumeUserGuide.html

    A simple example

    Here, we give an example configuration file, describing a single-node Flume deployment. This configuration lets a user generate events and subsequently logs them to the console.

    # example.conf: A single-node Flume configuration
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = localhost
    a1.sources.r1.port = 44444
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.

    Given this configuration file, we can start Flume as follows:

    $ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
    

    Note that in a full deployment we would typically include one more option: --conf=<conf-dir>. The <conf-dir> directory would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.

    From a separate terminal, we can then telnet port 44444 and send Flume an event:

    $ telnet localhost 44444
    Trying 127.0.0.1...
    Connected to localhost.localdomain (127.0.0.1).
    Escape character is '^]'.
    Hello world! <ENTER>
    OK

    The original Flume terminal will output the event in a log message.

    12/06/19 15:32:19 INFO source.NetcatSource: Source starting
    12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
    12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }
    

    Congratulations - you’ve successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail.

    以下为具体搭建流程

    Flume搭建_案例一:单个Flume

     

    安装node2上

    1.   上传到/home/tools,解压,解压后移动到/home下

    2.   重命名,并修改flume-env.sh

    vi flume-env.sh
     
    3.   配置Flume的环境变量
    vi /etc/profile
    source /etc/profile
    查看Flume的版本,看Flume的环境变量是否配置成功
     
    4.    在/home下创建tests_flume, 并创建flume配置文件
    cd test_flume
    vi flume1
     
    5.    命令测试Flume是否安装成功
    flume-ng agent --conf /home/test_flume --conf-file /home/test_flume/flume1 --name a1 -Dflume.root.logger=INFO,console
     
    安装telnet
    随意输入 
    hi flume
    切换窗口查看
     
    退出 ctrl+]  quit
     

     

    Flume搭建_案例二:两个Flume做集群

    安装node1,node2上
    1. MemoryChanel配置
    2. capacity:默认该通道中最大的可以存储的event数量是100
    3. trasactionCapacity:每次最大可以从source中拿到或者送到sink中的event数量也是100
    4. keep-aliveevent添加到通道中或者移出的允许时间
    5. byte**:即event的字节量的限制,只包括eventbody
    1.   node1,node2,上传压缩包到/home/tools下,解压,
    2.     
    修改conf下的flume-env.sh中的java环境变量,
    3.
         在/etc/profile下 
    配置Flume的环境变量
    4.
     
        node1,node2下创建测试目录test_flume,并分别在node1,node2下创建配置文件——flume21,flume22
    node1下创建flume21
     
    node2下创建flume22
     
    5.   node1,node2分别启动flume(注意因为node2在后面,所以先启动node2中flume,再启动node1中flume)
    1. 先启动node02Flume
    2. flume-ng agent -n a1 -c conf -f avro.conf -Dflume.root.logger=INFO,console
    3. flume-ng agent -n a1 -c conf -/home/test_flume/flume22 -Dflume.root.logger=INFO,console
    4. 再启动node01Flume
    5. flume-ng agent -n a1 -c conf -f simple.conf2 -Dflume.root.logger=INFO,console
    6. flume-ng agent -n a1 -c conf -/home/test_flume/flume21 -Dflume.root.logger=INFO,console
    node2:
    node1:
     
    6.   打开telnet测试,node2输出结果
     

    Flume搭建_案例三:如何监控一个文件的变化?

    安装node2上
    1.   node2,上传压缩包到/home/tools下,解压,
    2.     
    修改conf下的flume-env.sh中的java环境变量,
    3.
         在/etc/profile下 
    配置Flume的环境变量
    4.
     
        
    node2下创建测试目录test_flume,node2下创建配置文件——flume3
    mkdir test_flume
    vi flume3
     
    5.    node2启动flume
    1. 启动Flume
    2. flume-ng agent -n a1 -c conf -f exec.conf -Dflume.root.logger=INFO,console
    3. flume-ng agent -n a1 -c conf -/home/test_flume/flume3 -Dflume.root.logger=INFO,console
    6.    测试
    在/home/test_flume下创建空文件演示 touch flume.exec.log
    循环添加数据
    for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done
     
     

    Flume搭建_案例四: 如何监控一个文件:目录的变化?

    安装node2上
    1.   node2,上传压缩包到/home/tools下,解压,
    2.     
    修改conf下的flume-env.sh中的java环境变量,
    3.
         在/etc/profile下 
    配置Flume的环境变量
    4.
     
         
    node2下创建测试目录test_flume,node2下创建配置文件——flume4
    mkdir test_flume
    vi flume4
     
    5.    node2启动flume
    6.    测试
     
     

    Flume搭建_案例五: 如何定义一个HDFS类型的Sink?

    安装node2上

    Flume搭建_案例五_配置项解读

    1.   Flume中日期的格式
       什么时候会用?
           Flume收集的时候根据时间来创建,比如今天的产生的数据就创建20170216,昨天的就放在20170215下
    !注意
     
    2.   Flume是如何找到HDFS?
        Flume如果配置的是hdfs,它会根据系统中配置的环境变量去找
     
    3.   Flume什么时候滚动生成新文件?
    滚动的间隔,大小,数量
    hdfs.rollInterval 30 Number of seconds to wait before rolling current file (0 = never roll based on time interval)
    hdfs.rollSize 1024 File size to trigger roll, in bytes (0: never roll based on file size)
    hdfs.rollCount 10 Number of events written to file before it rolled (0 = never roll based on number of events)

    4.   多长时间没有操作,Flume将一个临时文件生成新文件?

    hdfs.idleTimeout 0 Timeout after which inactive files get closed (0 = disable automatic closing of idle files)

    5.   多长时间生成一个新的目录?(比如每10s生成一个新的目录)

          四舍五入,没有五入,只有四舍

          (比如57分划分为55分,5,6,7,8,9在一个目录,10,11,12,13,14在一个目录)

    hdfs.round false Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
    hdfs.roundValue 1 Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
    hdfs.roundUnit second The unit of the round down value - secondminute or hour.
    1.   node2,上传压缩包到/home/tools下,解压,
    2.     
    修改conf下的flume-env.sh中的java环境变量,
    3.
         在/etc/profile下 
    配置Flume的环境变量
    4.
     
         
    node2下创建测试目录test_flume,node2下创建配置文件——flume5
    mkdir test_flume
    vi flume5
     
    超越永无止境
  • 相关阅读:
    IGV解读
    box-cox解读
    linux命令eval的用法
    R中导入excel乱码的解决办法
    Django下实现HelloWorld
    python的list求和与求积
    win10下安装Django
    python下实现汉诺塔
    (stm32f103学习总结)—DS18B20
    (stm32f103学习总结)—GPIO结构
  • 原文地址:https://www.cnblogs.com/wangbin/p/8192950.html
Copyright © 2020-2023  润新知