• 《OD大数据实战》Flume入门实例


    一、netcat source + memory channel + logger sink

    1. 修改配置

    1)修改$FLUME_HOME/conf下的flume-env.sh文件,修改内容如下

    export JAVA_HOME=/opt/modules/jdk1.7.0_67

    2)在$FLUME_HOME/conf目录下,创建agent子目录,新建netcat-memory-logger.conf,配置内容如下:

    # netcat-memory-logger
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = beifeng-hadoop-02
    a1.sources.r1.port = 44444
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    2. 启动flume并测试

    1) 启动

    bin/flume-ng agent -n a1 -c conf/ -f conf/agent/netcat-memory-logger.conf -Dflume.root.logger=INFO,console

    2) 测试

    nc beifeng-hadoop-02 44444

    输入任意字符串,观察服务器的日志文件即可。

    使用linux的nc命令,如果命令不存在则先安装一下。 

    安装netcat:sudo yum -y install nc

    二、agent: avro source + file channel + hdfs sink 

    1. 增加配置

    在$FLUME_HOME/conf目录下,创建agent子目录,新建avro-file-hdfs.conf,配置内容如下:

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = beifeng-hadoop-02
    a1.sources.r1.port = 4141
    
    # Describe the sink
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://beifeng-hadoop-02:9000/flume/events/%Y-%m-%d
    # default:FlumeData
    a1.sinks.k1.hdfs.filePrefix = FlumeData
    a1.sinks.k1.hdfs.useLocalTimeStamp = true
    a1.sinks.k1.hdfs.rollInterval = 0
    a1.sinks.k1.hdfs.rollCount = 0
    # 一般接近block 128 120 125
    a1.sinks.k1.hdfs.rollSize = 10240
    a1.sinks.k1.hdfs.fileType = DataStream
    #a1.sinks.k1.hdfs.round = true
    #a1.sinks.k1.hdfs.roundValue = 10
    #a1.sinks.k1.hdfs.roundUnit = minute
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = file
    a1.channels.c1.checkpointDir = /opt/modules/cdh/apache-flume-1.5.0-cdh5.3.6-bin/checkpoint
    a1.channels.c1.dataDirs = /opt/modules/cdh/apache-flume-1.5.0-cdh5.3.6-bin/data
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    2. 启动并测试

    1)启动flume agent

    bin/flume-ng agent -n a1 -c conf/ -f conf/agent/avro-file-hdfs.conf -Dflume.root.logger=INFO,console

    2)使用flume自带的avro-client测试

    bin/flume-ng avro-client --host beifeng-hadoop-02 --port 4141 --filename /home/beifeng/order_info.txt
  • 相关阅读:
    01 Windows编程——Hello World
    图像处理基础知识
    集成IDE anaconda
    Python中的正则表达式
    Introduction of Machine Learning
    Linux命令——diff、patch
    sed & awk 概述
    Linux行编辑器——ed
    Linux命令——w、who、whoami、lastlog、last
    【问题】统计系统上有多少个用户
  • 原文地址:https://www.cnblogs.com/yeahwell/p/5746057.html
Copyright © 2020-2023  润新知