• flume 使用 spool source的时候字符集出错


    1. 错误所在

    2016-04-21 02:23:05,508 (pool-3-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:256)] FATAL: Spool Directory source source1: { spoolDir: /home/hadoop_admin/movielog/ }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
    java.nio.charset.MalformedInputException: Input length = 1
            at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
            at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:195)
            at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
            at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
            at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
            at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)
            at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:227)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    2. 解决方法

      原因的inputCharset属性的默认值UTF-8,但是所读取的日志文件的字符集却是GBK,所以更改一下这个属性值就可以了

    agent1.sources = source1
    agent1.channels = channel1
    agent1.sinks = sink1
    
    # For each one of the sources, the type is defined
    agent1.sources.source1.type = spooldir
    agent1.sources.source1.spoolDir =/home/hadoop_admin/movielog/
    agent1.sources.source1.inputCharset = GBK
    agent1.sources.source1.fileHeader = true
    agent1.sources.source1.deletePolicy = immediate
    agent1.sources.source1.batchSize = 1000
    agent1.sources.source1.channels = channel1
    
    # Each sink's type must be defined
    agent1.sinks.sink1.type = hdfs
    agent1.sinks.sink1.hdfs.path = hdfs://master:9000/flumeTest
    agent1.sinks.sink1.hdfs.filePrefix = master-
    agent1.sinks.sink1.hdfs.writeFormat = Text
    agent1.sinks.sink1.hdfs.fileType = DataStream
    agent1.sinks.sink1.hdfs.rollInterval = 0
    agent1.sinks.sink1.hdfs.rollSize = 10240
    agent1.sinks.sink1.hdfs.batchSize = 100
    agent1.sinks.sink1.hdfs.callTimeout = 30000
    agent1.sinks.sink1.channel = channel1
    
    # Each channel's type is defined.
    agent1.channels.channel1.type = memory
    agent1.channels.channel1.capacity = 100000
    agent1.channels.channel1.transactionCapacity = 100000
    agent1.channels.channel1.keep-alive = 30

      

  • 相关阅读:
    第一次结对编程作业
    第一次个人编程作业
    获取file中字段,写入到TXT文件中
    通过file中的字段查询MySQL内容
    MySQL常用语句
    MySQL乱码问题
    脚本数据编码格式转换
    mysql 常用命令操作
    thinkphp项目 Class 'finfo' not found
    POJ3255--次短路
  • 原文地址:https://www.cnblogs.com/linux-wangkun/p/5434788.html
Copyright © 2020-2023  润新知