• centos安装flume



    http://flume.apache.org/
    http://flume.apache.org/download.html


    系统要求:
    JDK1.7及以上


    一、安装flume为服务

    在官网下载最新版本:比如 apache-flume-1.7.0-bin.tar.gz

    1、检查操作系统中是否安装了FLUME服务的依赖函数,没有则安装相关的包:
    ll /lib/lsb/init-functions
    yum whatprovides /lib/lsb/init-functions
    yum install redhat-lsb


    2、解压:
    上传apache-flume-1.7.0-bin.tar.gz,flume-init.d.sh到服务器目录;
    tar -zxvf apache-flume-1.7.0-bin.tar.gz


    3、安装flume文件:
    cp -r apache-flume-1.7.0-bin/bin/flume-ng /usr/bin/flume-ng
    cp -r apache-flume-1.7.0-bin/lib /usr/lib/flume

    cp flume-init.d.sh /etc/init.d/flume
    chmod +x /etc/init.d/flume


    4、设置配置文件:
    mkdir /var/log/flume/
    mkdir -p /etc/flume/conf.d/
    cd /etc/flume/conf.d/
    vi flume.conf


    采集NG日志到KAFKA,添加如下内容:
    # flume.conf: A Flume configuration

    # Agent a1
    a1.sources = r1
    a1.sinks = k1 k2
    a1.channels = c1

    # source 配置
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.channels = c1
    a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
    a1.sources.r1.filegroups = f1 f2
    a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
    a1.sources.r1.headers.f1.topic = itp_common_nginx_access
    #a1.sources.r1.headers.f1.headerKey1 = value1
    a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
    a1.sources.r1.headers.f2.topic = itp_common_nginx_error
    #a1.sources.r1.headers.f2.headerKey1 = value2
    #a1.sources.r1.headers.f2.headerKey2 = value2-2
    a1.sources.r1.fileHeader = false
    a1.sources.r1.deserializer.maxLineLength=65535

    # sink 配置
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = mytopic
    a1.sinks.k1.kafka.bootstrap.servers = 10.88.42.157:6667,10.88.42.158:6667,10.88.42.159:6667
    a1.sinks.k1.kafka.flumeBatchSize = 100
    a1.sinks.k1.kafka.producer.acks = 1

    # channel 配置
    a1.channels.c1.type = file
    a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
    a1.channels.c1.dataDirs = /var/log/flume/a1/data

    # 绑定source、single到channel上
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1


    采集TOMCAT日志到KAFKA,添加如下内容:
    # Flume configuration

    # Agent a1
    a1.sources = r1
    a1.sinks = k1 k2
    a1.channels = c1 c2

    # source 配置
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
    a1.sources.r1.filegroups = f1 f2
    a1.sources.r1.filegroups.f1 = /opt/tomcat-7.0.55/logs/emm.log
    a1.sources.r1.headers.f1.topic = itp_emm_app
    a1.sources.r1.filegroups.f2 = /opt/tomcat-7.0.55/logs/catalina.out
    a1.sources.r1.headers.f2.topic = itp_emm_out
    a1.sources.r1.fileHeader = false
    a1.sources.r1.deserializer=LINE
    a1.sources.r1.deserializer.maxLineLength=65535
    a1.sources.r1.bufferMaxLineLength=65535
    a1.sources.r1.decodeErrorPolicy=IGNORE

    # channel 配置
    a1.channels.c1.type = file
    a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
    a1.channels.c1.dataDirs = /var/log/flume/a1/data

    # sink 配置
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = itp_emm_app
    a1.sinks.k1.kafka.bootstrap.servers = 10.5.218.13:6667,10.5.218.12:6667,10.5.218.11:6667
    a1.sinks.k1.kafka.flumeBatchSize = 100
    a1.sinks.k1.kafka.producer.acks = 1

    # 绑定source、single到channel上
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1


    采集到avro服务器,添加如下内容:
    # flume.conf: A Flume configuration

    # Agent a1
    a1.sources = r1
    a1.sinks = k1 k2
    a1.channels = c1

    # source 配置
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.channels = c1
    a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
    a1.sources.r1.filegroups = f1 f2
    a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
    a1.sources.r1.headers.f1.headerKey1 = value1
    a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
    a1.sources.r1.headers.f2.headerKey1 = value2
    a1.sources.r1.headers.f2.headerKey2 = value2-2
    a1.sources.r1.fileHeader = false
    a1.sources.r1.deserializer.maxLineLength=65535

    # sink 配置
    a1.sinks.k1.type=avro
    a1.sinks.k1.hostname=192.168.0.101
    a1.sinks.k1.port=4545

    a1.sinks.k2.type=avro
    a1.sinks.k2.hostname=192.168.0.102
    a1.sinks.k2.port=4545

    # channel 配置
    a1.channels.c1.type = file
    a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
    a1.channels.c1.dataDirs = /var/log/flume/a1/data

    # 绑定source、single到channel上
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    a1.sinks.k2.channel = c1

    # sinkgroups 配置主备
    a1.sinkgroups = g1
    a1.sinkgroups.g1.sinks = k1 k2
    a1.sinkgroups.g1.processor.type = failover
    a1.sinkgroups.g1.processor.priority.k1 = 10
    a1.sinkgroups.g1.processor.priority.k2 = 5
    a1.sinkgroups.g1.processor.maxpenalty = 30000



    vi flume-env.sh

    添加如下内容:
    JAVA_OPTS="-Xmx512m"
    #FLUME_JAVA_OPTS=
    FLUME_CLASSPATH=/usr/lib/flume/*
    #FLUME_JAVA_LIBRARY_PATH=
    #FLUME_APPLICATION_CLASS=


    vi log4j.properties

    添加如下内容:
    #flume.root.logger=DEBUG,console
    flume.root.logger=INFO,LOGFILE
    flume.log.dir=/var/log/flume/logs
    flume.log.file=flume.log

    log4j.logger.org.apache.flume.lifecycle = INFO
    log4j.logger.org.jboss = WARN
    log4j.logger.org.mortbay = INFO
    log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN
    log4j.logger.org.apache.hadoop = INFO
    log4j.logger.org.apache.hadoop.hive = ERROR

    # Define the root logger to the system property "flume.root.logger".
    log4j.rootLogger=${flume.root.logger}

    # Stock log4j rolling file appender
    # Default log rotation configuration
    log4j.appender.LOGFILE=org.apache.log4j.RollingFileAppender
    log4j.appender.LOGFILE.MaxFileSize=100MB
    log4j.appender.LOGFILE.MaxBackupIndex=10
    log4j.appender.LOGFILE.File=${flume.log.dir}/${flume.log.file}
    log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout
    log4j.appender.LOGFILE.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n

    # Warning: If you enable the following appender it will fill up your disk if you don't have a cleanup job!
    # This uses the updated rolling file appender from log4j-extras that supports a reliable time-based rolling policy.
    # See http://logging.apache.org/log4j/companions/extras/apidocs/org/apache/log4j/rolling/TimeBasedRollingPolicy.html
    # Add "DAILY" to flume.root.logger above if you want to use this
    log4j.appender.DAILY=org.apache.log4j.rolling.RollingFileAppender
    log4j.appender.DAILY.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
    log4j.appender.DAILY.rollingPolicy.ActiveFileName=${flume.log.dir}/${flume.log.file}
    log4j.appender.DAILY.rollingPolicy.FileNamePattern=${flume.log.dir}/${flume.log.file}.%d{yyyy-MM-dd}
    log4j.appender.DAILY.layout=org.apache.log4j.PatternLayout
    log4j.appender.DAILY.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n

    # console
    # Add "console" to flume.root.logger above if you want to use this
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d (%t) [%p - %l] %m%n




    5、启动服务:
    service flume start
    tail -f /var/log/flume.log
    tail -f /var/log/flume/logs/flume.log



    6、设置服务自启动:
    chkconfig --add flume
    chkconfig flume on



    二、配置日志切割:

    1、nginx日志切割:
    如果安装了官方的nginx rpm,会自动创建NG的日志切割文件,放在/etc/logrotate.d目录下:
    cd /etc/logrotate.d
    ll
    cat nginx
    检查配置内容,如果是早期配置文件,这下备份这个文件,再从官方模板中拷贝一个编辑,主要修改日志保留时间:
    mv nginx nginx.bak
    cp nginx.rpmnew nginx
    cat nginx
    vi nginx
    配置内容参考:
    /var/log/nginx/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    create 640 nginx adm
    sharedscripts
    postrotate
    if [ -f /var/run/nginx.pid ]; then
    kill -USR1 `cat /var/run/nginx.pid`
    fi
    endscript
    }

    手工执行切割脚本,看看执行效果
    /usr/sbin/logrotate -f /etc/logrotate.d/nginx


    2、tomcat日志切割:
    检查以前配置的日志切割定时配置:
    crontab -l
    创建新的切割配置文件:
    vi /etc/logrotate.d/tomcat
    配置内容参考:
    /opt/tomcat-7.0.55/logs/catalina.out
    /opt/tomcat-7.0.55/logs/emm.log
    {
    copytruncate
    daily
    rotate 2
    dateext
    nocompress
    missingok
    }

    手工执行切割脚本,看看执行效果
    /usr/sbin/logrotate -f /etc/logrotate.d/tomcat


    3、注意注释掉以前的基于rsyslog的服务调度:
    crontab -e
    修改定时调度任务,注释原来的调度后,
    重新载入配置:
    service crond reload
    重启服务:
    service crond restart
    检查定时服务运行状态
    service crond status
    ps -aux | grep crond


    停止rsyslog服务:
    service rsyslog stop
    chkconfig --del rsyslog
    chkconfig rsyslog off


    4、配置hosts解决flume的kafka客户端报错问题:
    添加大数据集群主机名,因为LINUX上的kafka服务器,缺省返回hostname主机名结客户端,客户端无法解析,
    客户端程序同时出现 ChannelClosedException 和 Batch Expired 异常:
    vi /etc/hosts

    10.88.42.157 node07.bigdata.com
    10.88.42.158 node08.bigdata.com
    10.88.42.159 node09.bigdata.com

    10.5.218.11 itnode01.bigdata
    10.5.218.12 itnode02.bigdata
    10.5.218.13 itnode03.bigdata

  • 相关阅读:
    Android状态栏白底黑字,只需两步轻松搞定
    MyBatis注解
    MyBatis延迟加载和缓存
    MyBatis关联查询
    mybatis智能标签1
    Mybatis智能标签
    增删改查
    初始MyBatis
    第7章:Servlet 基础
    第3章 JSP数据交互(二)
  • 原文地址:https://www.cnblogs.com/stimlee/p/7285785.html
Copyright © 2020-2023  润新知