• flume中sink到hdfs,文件系统频繁产生文件和出现乱码,文件滚动配置不起作用?


      问题描述 

     解决办法

      先把这个hdfs目录下的数据删除。并修改配置文件flume-conf.properties,重新采集。

    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    #
    #  http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing,
    # software distributed under the License is distributed on an
    # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    # KIND, either express or implied.  See the License for the
    # specific language governing permissions and limitations
    # under the License.
    
    
    # The configuration file needs to define the sources, 
    # the channels and the sinks.
    # Sources, channels and sinks are defined per agent, 
    # in this case called 'agent'
    
    agent1.sources = spool-source1
    agent1.sinks = hdfs-sink1
    agent1.channels = ch1
    
    #Define and configure an Spool directory source
    agent1.sources.spool-source1.channels=ch1
    agent1.sources.spool-source1.type=spooldir
    agent1.sources.spool-source1.spoolDir=/home/hadoop/data/flume/sqooldir/73/2012-09-22/
    agent1.sources.spool-source1.ignorePattern=event(_d{4}-d{2}-d{2}\_d{2}\_d{2})?.log(.COMPLETED)?
    agent1.sources.spool-source1.deserializer.maxLineLength=10240
    
    #Configure channel
    agent1.channels.ch1.type = file
    agent1.channels.ch1.checkpointDir = /home/hadoop/data/flume/checkpointDir
    agent1.channels.ch1.dataDirs = /home/hadoop/data/flume/dataDirs
    
    #Define and configure a hdfs sink
    agent1.sinks.hdfs-sink1.channel = ch1
    agent1.sinks.hdfs-sink1.type = hdfs
    agent1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/flume/%Y%m%d
    agent1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true
    agent1.sinks.hdfs-sink1.hdfs.rollInterval = 60
    agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
    agent1.sinks.hdfs-sink1.hdfs.rollCount = 0
    agent1.sinks.hdfs-sink1.hdfs.minBlockReplicas=1
    agent1.sinks.hdfs-sink1.hdfs.idleTimeout=0
    #agent1.sinks.hdfs-sink1.hdfs.codeC = snappy
    agent1.sinks.hdfs-sink1.hdfs.fileType=DataStream
    #agent1.sinks.hdfs-sink1.hdfs.writeFormat=Text
    
    
    
    
    
    # For each one of the sources, the type is defined
    #agent.sources.seqGenSrc.type = seq
    
    # The channel can be defined as follows.
    #agent.sources.seqGenSrc.channels = memoryChannel
    
    # Each sink's type must be defined
    #agent.sinks.loggerSink.type = logger
    
    #Specify the channel the sink should use
    #agent.sinks.loggerSink.channel = memoryChannel
    
    # Each channel's type is defined.
    #agent.channels.memoryChannel.type = memory
    
    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    #agent.channels.memoryChannel.capacity = 100

       教大家一招:大家在这些如flume的配置文件,最好还是去看官网,学会扩展,别只局限于别人的博客的文档,当然可以作为参考。关键还是来源于官方!

    [hadoop@master sqooldir]$ $HADOOP_HOME/bin/hadoop fs -rm -r /flume/20170502
    17/05/02 21:53:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/05/02 21:53:17 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /flume/20170502
    [hadoop@master sqooldir]$

     

     

      重新开启flume

     

    [hadoop@master flume]$ pwd
    /home/hadoop/app/flume
    [hadoop@master flume]$ bin/flume-ng agent -n agent1 -f conf/flume-conf.properties

     

     

      如果你的问题,还有副本数的问题,自行去解决。将$HADOOP_HOME/etc/hadoop/下的hdfs-site.xml的属性(master、slave1和slave2都要修改)

    <property>
                    <name>dfs.replication</name>
                    <value>3</value>
                    <description>Set to 1 for pseudo-distributed mode,Set to 2 for distributed mode,Set to 3 for distributed mode.</description>
     </property>

       记得重启hadoop集群。

     

  • 相关阅读:
    快速幂
    Oracle悲观锁和乐观锁
    UTL_RAW的问题?
    Linux操作系统下关于Top命令的参数详解
    存储过程与函数
    网站前端优化一些小经验
    Java获取各种常用时间方法2
    Pro CSS Techniques 读书笔记(六)
    Java获取各种常用时间方法
    Oracle专用服务器与共享服务器的区别
  • 原文地址:https://www.cnblogs.com/zlslch/p/6798858.html
Copyright © 2020-2023  润新知