• cdh5.47 上配置flume


    flume 配置文件

    # Define a memory channel called ch1 on agent1
    agent1.channels.ch1.type = memory
    agent1.channels.ch1.capacity = 100000
    agent1.channels.ch1.transactionCapacity = 100000
    agent1.channels.ch1.keep-alive = 30
     
    # Define an Avro source called avro-source1 on agent1 and tell it
    # to bind to 0.0.0.0:41414. Connect it to channel ch1.
    #agent1.sources.avro-source1.channels = ch1
    #agent1.sources.avro-source1.type = avro
    #agent1.sources.avro-source1.bind = 0.0.0.0
    #agent1.sources.avro-source1.port = 41414
    #agent1.sources.avro-source1.threads = 5
     
    #define source monitor a file
    agent1.sources.avro-source1.type = exec
    agent1.sources.avro-source1.shell = /bin/bash -c
    agent1.sources.avro-source1.command = tail -n +0 -F node3:/home/d2
    agent1.sources.avro-source1.channels = ch1
    agent1.sources.avro-source1.threads = 5
     
    # Define a logger sink that simply logs all events it receives
    # and connect it to the other end of the same channel.
    agent1.sinks.log-sink1.channel = ch1
    agent1.sinks.log-sink1.type = hdfs
    agent1.sinks.log-sink1.hdfs.path = hdfs://node2:8020/flume
    agent1.sinks.log-sink1.hdfs.writeFormat = Text
    agent1.sinks.log-sink1.hdfs.fileType = DataStream
    agent1.sinks.log-sink1.hdfs.rollInterval = 0
    agent1.sinks.log-sink1.hdfs.rollSize = 1000000
    agent1.sinks.log-sink1.hdfs.rollCount = 0
    agent1.sinks.log-sink1.hdfs.batchSize = 1000
    agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
    agent1.sinks.log-sink1.hdfs.callTimeout = 60000
    agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
     
    # Finally, now that we've defined all of our components, tell
    # agent1 which ones we want to activate.
    agent1.channels = ch1
    agent1.sources = avro-source1
    agent1.sinks = log-sink1

    日志错误信息:

    2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1
    2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1
    2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1
    2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1
    2016-01-08 16:27:12,371 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1
    2016-01-08 16:27:12,402 INFO org.apache.flume.conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]
    2016-01-08 16:27:12,402 WARN org.apache.flume.node.AbstractConfigurationProvider: No configuration found for this host:tier1
    2016-01-08 16:27:12,412 INFO org.apache.flume.node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
    2016-01-08 16:27:12,460 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
    2016-01-08 16:27:12,514 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
    2016-01-08 16:27:12,538 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:41414
    2016-01-08 16:27:12,568 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: 
    Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[AGENT],
    CATEGORY=[LOG_MESSAGE], ROLE=[flume-AGENT-7602b25bae87ced3f6367e647e645100], SEVERITY=[IMPORTANT],
    SERVICE=[flume], HOST_IDS=[cd0fc0b9-f0c1-4419-a33b-9020aab77a11],
    SERVICE_TYPE=[FLUME], LOG_LEVEL=[WARN], HOSTS=[node4], EVENTCODE=[EV_LOG_EVENT]},
    content=No configuration found for this host:tier1, timestamp=1452241632402}

     flume 配置步骤:

    第一: agent

    source 配置:

    channel 配置:

    skin 配置:

    配置方式2

    agent1.sources=source1
    agent1.sinks=sink1
    agent1.channels=channel1
    
    #Spooling Directory是监控指定文件夹中新文件的变化,一旦新文件出现,就解析该文件内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。
    #配置source1
    agent1.sources.source1.type=spooldir
    agent1.sources.source1.spoolDir=/home/flumeData
    agent1.sources.source1.channels=channel1
    agent1.sources.source1.fileHeader = false
    agent1.sources.source1.interceptors = i1
    agent1.sources.source1.interceptors.i1.type = timestamp
    
    #配置channel1
    agent1.channels.channel1.type=file
    agent1.channels.channel1.checkpointDir=/home/flumeData/hdfss
    agent1.channels.channel1.dataDirs=/home/flumeData/flumetemp
    
    #配置sink1
    agent1.sinks.sink1.type=hdfs
    agent1.sinks.sink1.hdfs.path=hdfs://node8:8020/flume
    agent1.sinks.sink1.hdfs.fileType=DataStream
    agent1.sinks.sink1.hdfs.writeFormat=TEXT
    agent1.sinks.sink1.hdfs.rollInterval=1
    agent1.sinks.sink1.channel=channel1
    agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
    

     错误提示:

       ... 12 more
    2016-01-19 15:43:35,975 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:Spool Directory source source1: { spoolDir: /home/flumeData/hdfss } } - Exception follows.
    org.apache.flume.FlumeException: Unable to read and modify files in the spooling directory: /home/flumeData/hdfss
            at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:162)
            at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:77)
            at org.apache.flume.client.avro.ReliableSpoolingFileEventReader$Builder.build(ReliableSpoolingFileEventReader.java:671)
            at org.apache.flume.source.SpoolDirectorySource.start(SpoolDirectorySource.java:85)
            at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
            at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: Permission denied
            at java.io.UnixFileSystem.createFileExclusively(Native Method)
            at java.io.File.createTempFile(File.java:2001)
            at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:152)
    

     提示权限你问题:

    org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/flume":hdfs.flume:supergroup:drwxr-xr-x

     解决办法:

     

    参考:

    hdfs dfs -chgrp -R [GROOP] Path

     hadoop chown  -R hdfs.flume  /flume

     hdfs dfs -chgrp -R flume /flume

  • 相关阅读:
    用OpenGL简单编写的一个最简单贪吃蛇游戏
    Python lambda map filter reduce
    Hadoop Python MapReduce
    Python faker生成数据
    Pandas数据清洗
    PySpark与jupyer notebook
    虚拟机与宿主机网络共享
    集合覆盖问题与贪婪算法
    最快路径与狄克斯特拉
    最短路径问题与广度优先搜索
  • 原文地址:https://www.cnblogs.com/zhanggl/p/5113681.html
Copyright © 2020-2023  润新知