• Flume_常见的几个问题


    在HDFS的文件默认生成文件大小1K,如何设置文件大小和数量

    拷贝一份flume-conf.properties.template改名为hive-mem-size.properties
    hive-mem-size.properties
      a1.sources = s1
      a1.channels = c1
      a1.sinks = k1
      # defined the source
      a1.sources.s1.type = exec
      a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
      a1.sources.s1.shell = /bin/sh -c
      # defined the channel
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 1000
      # defined the sink
      a1.sinks.k1.type = hdfs
      a1.sinks.k1.hdfs.path = /flume/hdfs/
      a1.sinks.k1.hdfs.fileType = DataStream 
      a1.sinks.k1.hdfs.rollInterval = 0 # 依据时间进行roll,设置为0表示不启用
      a1.sinks.k1.hdfs.rollSize = 10240 # 依据大小进行roll,设置为10240表示文件大小在10k左右
      a1.sinks.k1.hdfs.rollCount = 0    # 依据event数目进行roll,设置为0表示不启用
      # The channel can be defined as follows.
      a1.sources.s1.channels = c1
      a1.sinks.k1.channel = c1
    flmue目录下执行
      bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-size.properties -Dflume.root.logger=INFO,console
    

     使用Flume是为了将最新的数据或文件上传到HDFS上,那如果遇到分区表该如何解决

    拷贝一份flume-conf.properties.template改名为hive-mem-part.properties
    hive-mem-part.properties
      a1.sources = s1
      a1.channels = c1
      a1.sinks = k1
      # defined the source
      a1.sources.s1.type = exec
      a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
      a1.sources.s1.shell = /bin/sh -c
      # defined the channel
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 1000
      # defined the sink
      a1.sinks.k1.type = hdfs
      a1.sinks.k1.hdfs.useLocalTimeStamp = true    # 注意使用时间时,本地时间戳设置为true
      a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/
      a1.sinks.k1.hdfs.fileType = DataStream 
      # The channel can be defined as follows.
      a1.sources.s1.channels = c1
      a1.sinks.k1.channel = c1
    flmue目录下执行
      bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-part.properties -Dflume.root.logger=INFO,console
      这里与上面的文件大小有冲突,即设置了时间分区,肯定不能在特定时间内满足文件大小
    

    Flume上传文件默认是以FlumeData开头,如何更改开头信息

    拷贝一份flume-conf.properties.template改名为hive-mem-pre.properties
    hive-mem-pre.properties
      a1.sources = s1
      a1.channels = c1
      a1.sinks = k1
      # defined the source
      a1.sources.s1.type = exec
      a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
      a1.sources.s1.shell = /bin/sh -c
      # defined the channel
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 1000
      # defined the sink
      a1.sinks.k1.type = hdfs
      a1.sinks.k1.hdfs.useLocalTimeStamp = true    # 注意使用时间时,本地时间戳设置为true
      a1.sinks.k1.hdfs.filePrefix = hive-log
      a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/
      a1.sinks.k1.hdfs.fileType = DataStream 
      # The channel can be defined as follows.
      a1.sources.s1.channels = c1
      a1.sinks.k1.channel = c1
    flmue目录下执行
      bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-pre.properties -Dflume.root.logger=INFO,console
    

     企业中多台Flume如何解决磁盘IO问题

    启动一个hadoop集群(官方图示为4台,这里使用三台),分别部署和配置flume机器
      hadoop09-linux-01.ibeifeng.com 10.0.0.108 collenct
      hadoop09-linux-02.ibeifeng.com 10.0.0.109 agent 
      hadoop09-linux-03.ibeifeng.com 10.0.0.110 agent
    选择一个agent,进入flume目录
    拷贝一份flume-conf.properties.template改名为avro-agent-hive-file-hdfs.properties
    avro-agent-hive-file-hdfs.properties
      a1.sources = s1
      a1.channels = c1
      a1.sinks = k1
      # defined the source
      a1.sources.s1.type = exec
      a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
      a1.sources.s1.shell = /bin/sh -c
      # defined the channel
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 1000
      # defined the sink
      a1.sinks.k1.type = avro
      a1.sinks.k1.hostname = hadoop09-linux-01.ibeifeng.com # 接收方的IP或hostname
      a1.sinks.k1.port = 50505
      # The channel can be defined as follows.
      a1.sources.s1.channels = c1
      a1.sinks.k1.channel = c1
      scp发送到另一台agent
      scp conf/avro-agent-hive-file-hdfs.properties  hadoop09-linux-03.ibeifeng.com:/opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/conf/
      进入collenct机器下的flume下
      拷贝一份flume-conf.properties.template改名为avro-collenct-hive-file-hdfs.properties
      a1.sources = s1
      a1.channels = c1
      a1.sinks = k1
      # defined the source
      a1.sources.s1.type = avro
      a1.sources.s1.bind = hadoop09-linux-01.ibeifeng.com
      a1.sources.s1.port = 50505
      a1.sources.s1.
      # defined the channel
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 1000
      # defined the sink
      a1.sinks.k1.type = hdfs
      a1.sinks.k1.hdfs.filePrefix = avro
      a1.sinks.k1.hdfs.useLocalTimeStamp = true
      a1.sinks.k1.hdfs.path = /flume/hdfs
      a1.sinks.k1.hdfs.fileType = DataStream 
      a1.sinks.k1.hdfs.rollInterval = 0
      a1.sinks.k1.hdfs.rollSize = 20480
      a1.sinks.k1.hdfs.rollCount = 0
      # The channel can be defined as follows.
      a1.sources.s1.channels = c1
      a1.sinks.k1.channel = c1
    启动rpcbind服务
      再分别启动:
        bin/flume-ng agent -c conf/ -n a1 -f conf/avro-collenct-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
        bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
        bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
    测试
    

     如何解决不同操作系统下Flume  

    搭建nfs服务器,挂载不同系统中的目录,直接使用
    
  • 相关阅读:
    类型转换
    关于lseek()
    ubuntuj开机没有开机音乐
    Linux下PF_PACKET的使用
    运行apue下的第一个程序
    Spring Framework 开发参考手册
    mysql blob
    此驱动程序不支持 Java Runtime Environment (JRE) 1.6 版。请使用支持 JDBC 4.0 的 sqljdbc4.jar 类库
    [转]解决:The Apache Tomcat Native library which allows optimal performance in production environments was not found
    java等号
  • 原文地址:https://www.cnblogs.com/eRrsr/p/6097323.html
Copyright © 2020-2023  润新知