• 2.搭建Flume


    搭建Flume

    1.下载解压(官网)

    1. 在/opt目录下新建目录flume

    2. 将下载好的apache-flume-1.9.0-bin.tar.gz,利用Xftp6上传到/opt/flume目录下

    3. 利用命令:tar -xvf apache-flume-1.9.0-bin.tar.gz

       

    2.配置flume

    1. 进⼊flume的conf⽬录下,拷⻉flume-env.sh.template然后重命名为flume-env.sh

      cp flume-env.sh.template flume-env.sh
      chmod 777 flume-env.sh
      vim flume-env.sh
    2. 进⼊flume-env.sh中配置java jdk路径

      export JAVA_HOME=/opt/java/jdk1.8.0_261

    3.配置环境变量

    1. 在/opt目录下

      vim .bash_profile

      # Flume
      export FLUME_HOME=/opt/flume/export FLUME_HOME=/opt/flume/apache-flume-1.9.0-bin
      export PATH=$PATH:$FLUME_HOME/bin

      source .bash_profile

    2. 在flume-env.sh文件中配置

      # Flume
      export FLUME_HOME=/opt/flume/export FLUME_HOME=/opt/flume/apache-flume-1.9.0-bin
      export PATH=$PATH:$FLUME_HOME/bin
    3. 版本验证

      flume-ng version

      ps:(/etc/profile中我也添加了代码段中的配置)#刷新配置source /etc/profile

    4.进行文件传输配置slave

    scp -r /opt/flume/apache-flume-1.9.0-bin slave1:/opt/flume/
    scp -r /opt/flume/apache-flume-1.9.0-bin slave2:/opt/flume/

    按照上诉步骤在slave1、slave2进行操作

    5.Flume部署示例(CSDN)

    4.1 Avro(以下还没操作修改)

    Flume可以通过Avro监听某个端口并捕获传输的数据,具体示例如下:

    // 创建一个Flume配置文件

    $ cd app/cdh/flume-1.6.0-cdh5.7.1

    $ mkdir example

    $ cp conf/flume-conf.properties.template example/netcat.conf

     

    // 配置netcat.conf用于实时获取另一终端输入的数据

    $ vim example/netcat.conf

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = netcat

    a1.sources.r1.bind = localhost

    a1.sources.r1.port = 44444

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel that buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

     

    // 运行FlumeAgent,监听本机的44444端口

    $ flume-ng agent -c conf -f example/netcat.conf -n a1 -Dflume.root.logger=INFO,console

     

    img

    // 打开另一终端,通过telnet登录localhost的44444,输入测试数据

    $ telnet localhost 44444

     

    img

    // 查看flume收集数据情况

    img

    4.2 Spool

    Spool用于监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:拷贝到spool目录下的文件不可以再打开编辑、spool目录下不可包含相应的子目录。具体示例如下:

    // 创建两个Flume配置文件

    $ cd app/cdh/flume-1.6.0-cdh5.7.1

    $ cp conf/flume-conf.properties.template example/spool1.conf

    $ cp conf/flume-conf.properties.template example/spool2.conf

     

    // 配置spool1.conf用于监控目录avro_data的文件,将文件内容发送到本地60000端口

    $ vim example/spool1.conf

    # Namethe components

    local1.sources= r1

    local1.sinks= k1

    local1.channels= c1

    # Source

    local1.sources.r1.type= spooldir

    local1.sources.r1.spoolDir= /home/hadoop/avro_data

    # Sink

    local1.sinks.k1.type= avro

    local1.sinks.k1.hostname= localhost

    local1.sinks.k1.port= 60000

    #Channel

    local1.channels.c1.type= memory

    # Bindthe source and sink to the channel

    local1.sources.r1.channels= c1

    local1.sinks.k1.channel= c1

     

    // 配置spool2.conf用于从本地60000端口获取数据并写入HDFS

    # Namethe components

    a1.sources= r1

    a1.sinks= k1

    a1.channels= c1

    # Source

    a1.sources.r1.type= avro

    a1.sources.r1.channels= c1

    a1.sources.r1.bind= localhost

    a1.sources.r1.port= 60000

    # Sink

    a1.sinks.k1.type= hdfs

    a1.sinks.k1.hdfs.path= hdfs://localhost:9000/user/wcbdd/flumeData

    a1.sinks.k1.rollInterval= 0

    a1.sinks.k1.hdfs.writeFormat= Text

    a1.sinks.k1.hdfs.fileType= DataStream

    # Channel

    a1.channels.c1.type= memory

    a1.channels.c1.capacity= 10000

    # Bind the source and sink to the channel

    a1.sources.r1.channels= c1

    a1.sinks.k1.channel= c1

     

    // 分别打开两个终端,运行如下命令启动两个Flume Agent

    $ flume-ng agent -c conf -f example/spool2.conf -n a1

    $ flume-ng agent -c conf -f example/spool1.conf -n local1

     

    // 查看本地文件系统中需要监控的avro_data目录内容

    $ cd avro_data

    $ cat avro_data.txt

    img

     

    // 查看写HDFS的Agent,检查是否捕获了数据别写入HDFS

    img

     

    // 通过WEB UI查看HDFS中的文件

    img

    4.3 其它

    Flume内置了大量的Source,其中Avro Source、Thrift Source、Spooling Directory Source、Kafka Source具有较好的性能和较广泛的使用场景。下面是Source的一些参考资料:

    img

    img

    img

    imgimg

    img

    img

    img

     

    小石小石摩西摩西的学习笔记,欢迎提问,欢迎指正!!!
  • 相关阅读:
    Docker安装
    Mysql 安全登陆工具 mysql_config_editor
    位图索引对于DML操作的影响
    删除Oracle Online Redo 测试
    16 Managing Undo
    Linux 不杀进程的情况下,如何释放磁盘资源
    SFTP 服务搭建
    8. DBNEWID 工具(使用nid命令修改db name及dbid)
    Null 值对索引排序的影响案例一则
    opensshd 源码升级
  • 原文地址:https://www.cnblogs.com/shijingwen/p/13677480.html
Copyright © 2020-2023  润新知