http://flume.apache.org/
http://flume.apache.org/download.html
系统要求:
JDK1.7及以上
一、安装flume为服务
在官网下载最新版本:比如 apache-flume-1.7.0-bin.tar.gz
1、检查操作系统中是否安装了FLUME服务的依赖函数,没有则安装相关的包:
ll /lib/lsb/init-functions
yum whatprovides /lib/lsb/init-functions
yum install redhat-lsb
2、解压:
上传apache-flume-1.7.0-bin.tar.gz,flume-init.d.sh到服务器目录;
tar -zxvf apache-flume-1.7.0-bin.tar.gz
3、安装flume文件:
cp -r apache-flume-1.7.0-bin/bin/flume-ng /usr/bin/flume-ng
cp -r apache-flume-1.7.0-bin/lib /usr/lib/flume
cp flume-init.d.sh /etc/init.d/flume
chmod +x /etc/init.d/flume
4、设置配置文件:
mkdir /var/log/flume/
mkdir -p /etc/flume/conf.d/
cd /etc/flume/conf.d/
vi flume.conf
采集NG日志到KAFKA,添加如下内容:
# flume.conf: A Flume configuration
# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
a1.sources.r1.headers.f1.topic = itp_common_nginx_access
#a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
a1.sources.r1.headers.f2.topic = itp_common_nginx_error
#a1.sources.r1.headers.f2.headerKey1 = value2
#a1.sources.r1.headers.f2.headerKey2 = value2-2
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer.maxLineLength=65535
# sink 配置
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = 10.88.42.157:6667,10.88.42.158:6667,10.88.42.159:6667
a1.sinks.k1.kafka.flumeBatchSize = 100
a1.sinks.k1.kafka.producer.acks = 1
# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data
# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
采集TOMCAT日志到KAFKA,添加如下内容:
# Flume configuration
# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /opt/tomcat-7.0.55/logs/emm.log
a1.sources.r1.headers.f1.topic = itp_emm_app
a1.sources.r1.filegroups.f2 = /opt/tomcat-7.0.55/logs/catalina.out
a1.sources.r1.headers.f2.topic = itp_emm_out
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer=LINE
a1.sources.r1.deserializer.maxLineLength=65535
a1.sources.r1.bufferMaxLineLength=65535
a1.sources.r1.decodeErrorPolicy=IGNORE
# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data
# sink 配置
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = itp_emm_app
a1.sinks.k1.kafka.bootstrap.servers = 10.5.218.13:6667,10.5.218.12:6667,10.5.218.11:6667
a1.sinks.k1.kafka.flumeBatchSize = 100
a1.sinks.k1.kafka.producer.acks = 1
# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
采集到avro服务器,添加如下内容:
# flume.conf: A Flume configuration
# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
a1.sources.r1.headers.f2.headerKey1 = value2
a1.sources.r1.headers.f2.headerKey2 = value2-2
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer.maxLineLength=65535
# sink 配置
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.0.101
a1.sinks.k1.port=4545
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.0.102
a1.sinks.k2.port=4545
# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data
# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
# sinkgroups 配置主备
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 10
a1.sinkgroups.g1.processor.priority.k2 = 5
a1.sinkgroups.g1.processor.maxpenalty = 30000
vi flume-env.sh
添加如下内容:
JAVA_OPTS="-Xmx512m"
#FLUME_JAVA_OPTS=
FLUME_CLASSPATH=/usr/lib/flume/*
#FLUME_JAVA_LIBRARY_PATH=
#FLUME_APPLICATION_CLASS=
vi log4j.properties
添加如下内容:
#flume.root.logger=DEBUG,console
flume.root.logger=INFO,LOGFILE
flume.log.dir=/var/log/flume/logs
flume.log.file=flume.log
log4j.logger.org.apache.flume.lifecycle = INFO
log4j.logger.org.jboss = WARN
log4j.logger.org.mortbay = INFO
log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN
log4j.logger.org.apache.hadoop = INFO
log4j.logger.org.apache.hadoop.hive = ERROR
# Define the root logger to the system property "flume.root.logger".
log4j.rootLogger=${flume.root.logger}
# Stock log4j rolling file appender
# Default log rotation configuration
log4j.appender.LOGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.LOGFILE.MaxFileSize=100MB
log4j.appender.LOGFILE.MaxBackupIndex=10
log4j.appender.LOGFILE.File=${flume.log.dir}/${flume.log.file}
log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.LOGFILE.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n
# Warning: If you enable the following appender it will fill up your disk if you don't have a cleanup job!
# This uses the updated rolling file appender from log4j-extras that supports a reliable time-based rolling policy.
# See http://logging.apache.org/log4j/companions/extras/apidocs/org/apache/log4j/rolling/TimeBasedRollingPolicy.html
# Add "DAILY" to flume.root.logger above if you want to use this
log4j.appender.DAILY=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.DAILY.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.DAILY.rollingPolicy.ActiveFileName=${flume.log.dir}/${flume.log.file}
log4j.appender.DAILY.rollingPolicy.FileNamePattern=${flume.log.dir}/${flume.log.file}.%d{yyyy-MM-dd}
log4j.appender.DAILY.layout=org.apache.log4j.PatternLayout
log4j.appender.DAILY.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n
# console
# Add "console" to flume.root.logger above if you want to use this
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d (%t) [%p - %l] %m%n
5、启动服务:
service flume start
tail -f /var/log/flume.log
tail -f /var/log/flume/logs/flume.log
6、设置服务自启动:
chkconfig --add flume
chkconfig flume on
二、配置日志切割:
1、nginx日志切割:
如果安装了官方的nginx rpm,会自动创建NG的日志切割文件,放在/etc/logrotate.d目录下:
cd /etc/logrotate.d
ll
cat nginx
检查配置内容,如果是早期配置文件,这下备份这个文件,再从官方模板中拷贝一个编辑,主要修改日志保留时间:
mv nginx nginx.bak
cp nginx.rpmnew nginx
cat nginx
vi nginx
配置内容参考:
/var/log/nginx/*.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 640 nginx adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}
手工执行切割脚本,看看执行效果
/usr/sbin/logrotate -f /etc/logrotate.d/nginx
2、tomcat日志切割:
检查以前配置的日志切割定时配置:
crontab -l
创建新的切割配置文件:
vi /etc/logrotate.d/tomcat
配置内容参考:
/opt/tomcat-7.0.55/logs/catalina.out
/opt/tomcat-7.0.55/logs/emm.log
{
copytruncate
daily
rotate 2
dateext
nocompress
missingok
}
手工执行切割脚本,看看执行效果
/usr/sbin/logrotate -f /etc/logrotate.d/tomcat
3、注意注释掉以前的基于rsyslog的服务调度:
crontab -e
修改定时调度任务,注释原来的调度后,
重新载入配置:
service crond reload
重启服务:
service crond restart
检查定时服务运行状态
service crond status
ps -aux | grep crond
停止rsyslog服务:
service rsyslog stop
chkconfig --del rsyslog
chkconfig rsyslog off
4、配置hosts解决flume的kafka客户端报错问题:
添加大数据集群主机名,因为LINUX上的kafka服务器,缺省返回hostname主机名结客户端,客户端无法解析,
客户端程序同时出现 ChannelClosedException 和 Batch Expired 异常:
vi /etc/hosts
10.88.42.157 node07.bigdata.com
10.88.42.158 node08.bigdata.com
10.88.42.159 node09.bigdata.com
10.5.218.11 itnode01.bigdata
10.5.218.12 itnode02.bigdata
10.5.218.13 itnode03.bigdata