• [Flume]使用 Flume 来传递web log 到 hdfs 的例子



    [Flume]使用 Flume 来传递web log 到 hdfs 的例子:

    在 hdfs 上创建存储 log 的目录:
    $ hdfs dfs -mkdir -p /test001/weblogsflume

    指定log 输入的目录:
    $ sudo mkdir -p /flume/weblogsmiddle

    设定使得log 可以被任何用户访问:
    $ sudo chmod a+w -R /flume
    $

    设置配置文件内容:

    $ cat /mytraining/exercises/flume/spooldir.conf

    #Setting component
    agent1.sources = webserver-log-source
    agent1.sinks = hdfs-sink
    agent1.channels = memory-channel

    #Setting source
    agent1.sources.webserver-log-source.type = spooldir
    agent1.sources.webserver-log-source.spoolDir = /flume/weblogsmiddle
    agent1.sources.webserver-log-source.channels = memory-channel

    #Setting sinks
    agent1.sinks.hdfs-sink.type = hdfs
    agent1.sinks.hdfs-sink.hdfs.path = /test001/weblogsflume/
    agent1.sinks.hdfs-sink.channel = memory-channel
    agent1.sinks.hdfs-sink.hdfs.rollInterval = 0
    agent1.sinks.hdfs-sink.hdfs.rollSize = 524288
    agent1.sinks.hdfs-sink.hdfs.rollCount = 0
    agent1.sinks.hdfs-sink.hdfs.fileType = DataStream

    #Setting channels
    agent1.channels.memory-channel.type = memory
    agent1.channels.memory-channel.capacity = 100000
    agent1.channels.memory-channel.transactionCapacity = 1000

     

    $cd /mytraining/exercises/flume/spooldir.conf

    启动 Flume:

    $ flume-ng agent --conf /etc/flume-ng/conf
    > --conf-file spooldir.conf
    > --name agent1 -Dflume.root.logger=INFO,console

    Info: Sourcing environment configuration script /etc/flume-ng/conf/flume-env.sh
    Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
    Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
    Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
    Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
    Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.7.5.jar from classpath
    Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12.jar from classpath
    Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
    Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
    Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
    Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
    Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from classpath
    Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from classpath
    Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12.jar from classpath
    Info: Including Hive libraries found via () for Hive access

    ...

    -Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application --conf-file spooldir.conf --name agent1
    2017-10-20 21:07:08,929 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
    2017-10-20 21:07:09,057 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:spooldir.conf
    2017-10-20 21:07:09,300 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
    2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
    2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: hdfs-sink Agent: agent1

    ...

    2017-10-20 21:07:09,304 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
    2017-10-20 21:07:09,306 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
    2017-10-20 21:07:09,310 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
    ...

    2017-10-20 21:07:10,398 (conf-file-poller-0) 
    [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{webserver-log-source=EventDrivenSourceRunner: { source:Spool Directory source webserver-log-source: { spoolDir: /flume/weblogsmiddle } }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@12c67180 counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} }

    ...

    2017-10-20 21:10:25,268 (pool-6-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)] Last read was never committed - resetting mark position.

    向 /flume/weblogsmiddle 传入 log:

    cp -r /mytest/weblogs /tmp/tmpweblogs
    mv /tmp/tmpweblogs/* /flume/weblogsmiddle


    等待几分钟后,查看 hdfs 上的变化:

    $
    $ hdfs dfs -ls /test001/weblogsflume

    -rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884
    -rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885
    ...
    
    -rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884
    -rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885
    $ 

    在flume-ng 启动的窗口,按下 Ctrl+C Ctrol+Z 停止 flume 的运行

    ^C
    ^Z
    [1]+ Stopped
    flume-ng agent --conf /etc/flume-ng/conf --conf-file spooldir.conf --name agent1 -Dflume.root.logger=INFO,console
    [training@localhost flume]$

  • 相关阅读:
    末学者笔记--KVM虚拟化存储管理(3)
    离线安装docker-ce
    OpenStack各组件的常用命令
    docker容器的基本命令
    nfs samba文件共享服务
    kvm虚拟机管理(创建、连接)
    虚拟化kvm的搭建
    python(pymysql操作数据库)
    三大特征 封装 继承 多态
    面向对象
  • 原文地址:https://www.cnblogs.com/gaojian/p/7706497.html
Copyright © 2020-2023  润新知