centos安装flume - 润新知

centos安装flume

http://flume.apache.org/
http://flume.apache.org/download.html

系统要求：
JDK1.7及以上

一、安装flume为服务

在官网下载最新版本：比如 apache-flume-1.7.0-bin.tar.gz

1、检查操作系统中是否安装了FLUME服务的依赖函数，没有则安装相关的包：
ll /lib/lsb/init-functions
yum whatprovides /lib/lsb/init-functions
yum install redhat-lsb

2、解压：
上传apache-flume-1.7.0-bin.tar.gz，flume-init.d.sh到服务器目录；
tar -zxvf apache-flume-1.7.0-bin.tar.gz

3、安装flume文件：
cp -r apache-flume-1.7.0-bin/bin/flume-ng /usr/bin/flume-ng
cp -r apache-flume-1.7.0-bin/lib /usr/lib/flume

cp flume-init.d.sh /etc/init.d/flume
chmod +x /etc/init.d/flume

4、设置配置文件：
mkdir /var/log/flume/
mkdir -p /etc/flume/conf.d/
cd /etc/flume/conf.d/
vi flume.conf

采集NG日志到KAFKA，添加如下内容：
# flume.conf: A Flume configuration

# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
a1.sources.r1.headers.f1.topic = itp_common_nginx_access
#a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
a1.sources.r1.headers.f2.topic = itp_common_nginx_error
#a1.sources.r1.headers.f2.headerKey1 = value2
#a1.sources.r1.headers.f2.headerKey2 = value2-2
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer.maxLineLength=65535

# sink 配置
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = 10.88.42.157:6667,10.88.42.158:6667,10.88.42.159:6667
a1.sinks.k1.kafka.flumeBatchSize = 100
a1.sinks.k1.kafka.producer.acks = 1

# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data

# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

采集TOMCAT日志到KAFKA，添加如下内容：
# Flume configuration

# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /opt/tomcat-7.0.55/logs/emm.log
a1.sources.r1.headers.f1.topic = itp_emm_app
a1.sources.r1.filegroups.f2 = /opt/tomcat-7.0.55/logs/catalina.out
a1.sources.r1.headers.f2.topic = itp_emm_out
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer=LINE
a1.sources.r1.deserializer.maxLineLength=65535
a1.sources.r1.bufferMaxLineLength＝65535
a1.sources.r1.decodeErrorPolicy=IGNORE

# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data

# sink 配置
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = itp_emm_app
a1.sinks.k1.kafka.bootstrap.servers = 10.5.218.13:6667,10.5.218.12:6667,10.5.218.11:6667
a1.sinks.k1.kafka.flumeBatchSize = 100
a1.sinks.k1.kafka.producer.acks = 1

# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

采集到avro服务器，添加如下内容：
# flume.conf: A Flume configuration

# Agent a1
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

# source 配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /var/log/nginx/access.log
a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /var/log/nginx/.*error.log
a1.sources.r1.headers.f2.headerKey1 = value2
a1.sources.r1.headers.f2.headerKey2 = value2-2
a1.sources.r1.fileHeader = false
a1.sources.r1.deserializer.maxLineLength=65535

# sink 配置
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.0.101
a1.sinks.k1.port=4545

a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.0.102
a1.sinks.k2.port=4545

# channel 配置
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/var/log/flume/a1/checkpoint
a1.channels.c1.dataDirs = /var/log/flume/a1/data

# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

# sinkgroups 配置主备
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 10
a1.sinkgroups.g1.processor.priority.k2 = 5
a1.sinkgroups.g1.processor.maxpenalty = 30000

vi flume-env.sh

添加如下内容：
JAVA_OPTS="-Xmx512m"
#FLUME_JAVA_OPTS=
FLUME_CLASSPATH=/usr/lib/flume/*
#FLUME_JAVA_LIBRARY_PATH=
#FLUME_APPLICATION_CLASS=

vi log4j.properties

添加如下内容：
#flume.root.logger=DEBUG,console
flume.root.logger=INFO,LOGFILE
flume.log.dir=/var/log/flume/logs
flume.log.file=flume.log

log4j.logger.org.apache.flume.lifecycle = INFO
log4j.logger.org.jboss = WARN
log4j.logger.org.mortbay = INFO
log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN
log4j.logger.org.apache.hadoop = INFO
log4j.logger.org.apache.hadoop.hive = ERROR

# Define the root logger to the system property "flume.root.logger".
log4j.rootLogger=${flume.root.logger}

# Stock log4j rolling file appender
# Default log rotation configuration
log4j.appender.LOGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.LOGFILE.MaxFileSize=100MB
log4j.appender.LOGFILE.MaxBackupIndex=10
log4j.appender.LOGFILE.File=${flume.log.dir}/${flume.log.file}
log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.LOGFILE.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n

# Warning: If you enable the following appender it will fill up your disk if you don't have a cleanup job!
# This uses the updated rolling file appender from log4j-extras that supports a reliable time-based rolling policy.
# See http://logging.apache.org/log4j/companions/extras/apidocs/org/apache/log4j/rolling/TimeBasedRollingPolicy.html
# Add "DAILY" to flume.root.logger above if you want to use this
log4j.appender.DAILY=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.DAILY.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.DAILY.rollingPolicy.ActiveFileName=${flume.log.dir}/${flume.log.file}
log4j.appender.DAILY.rollingPolicy.FileNamePattern=${flume.log.dir}/${flume.log.file}.%d{yyyy-MM-dd}
log4j.appender.DAILY.layout=org.apache.log4j.PatternLayout
log4j.appender.DAILY.layout.ConversionPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%t] (%C.%M:%L) %x - %m%n

# console
# Add "console" to flume.root.logger above if you want to use this
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d (%t) [%p - %l] %m%n

5、启动服务：
service flume start
tail -f /var/log/flume.log
tail -f /var/log/flume/logs/flume.log

6、设置服务自启动：
chkconfig --add flume
chkconfig flume on

二、配置日志切割：

1、nginx日志切割：
如果安装了官方的nginx rpm,会自动创建NG的日志切割文件，放在/etc/logrotate.d目录下：
cd /etc/logrotate.d
ll
cat nginx
检查配置内容，如果是早期配置文件，这下备份这个文件，再从官方模板中拷贝一个编辑，主要修改日志保留时间：
mv nginx nginx.bak
cp nginx.rpmnew nginx
cat nginx
vi nginx
配置内容参考：
/var/log/nginx/*.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 640 nginx adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}

手工执行切割脚本，看看执行效果
/usr/sbin/logrotate -f /etc/logrotate.d/nginx

2、tomcat日志切割：
检查以前配置的日志切割定时配置：
crontab -l
创建新的切割配置文件：
vi /etc/logrotate.d/tomcat
配置内容参考：
/opt/tomcat-7.0.55/logs/catalina.out
/opt/tomcat-7.0.55/logs/emm.log
{
copytruncate
daily
rotate 2
dateext
nocompress
missingok
}

手工执行切割脚本，看看执行效果
/usr/sbin/logrotate -f /etc/logrotate.d/tomcat

3、注意注释掉以前的基于rsyslog的服务调度：
crontab -e
修改定时调度任务，注释原来的调度后，
重新载入配置：
service crond reload
重启服务：
service crond restart
检查定时服务运行状态
service crond status
ps -aux | grep crond

停止rsyslog服务：
service rsyslog stop
chkconfig --del rsyslog
chkconfig rsyslog off

4、配置hosts解决flume的kafka客户端报错问题：
添加大数据集群主机名，因为LINUX上的kafka服务器，缺省返回hostname主机名结客户端，客户端无法解析，
客户端程序同时出现 ChannelClosedException 和 Batch Expired 异常：
vi /etc/hosts

10.88.42.157 node07.bigdata.com
10.88.42.158 node08.bigdata.com
10.88.42.159 node09.bigdata.com

10.5.218.11 itnode01.bigdata
10.5.218.12 itnode02.bigdata
10.5.218.13 itnode03.bigdata
相关阅读:
Windows10：家庭版如何设置开机自动登录
 Windows10：常用快捷键（转载）
Windows10：找回传统桌面的系统图标
 MyBatis之XML映射文件详解
 MyBatis之核心对象及配置文件详解
 Java用Jsoup登录网站，以JSON格式提交（POST）数据，处理JSON格式返回结果，并将处理结果保存到系统剪贴板
 (转)总结.Net下后端的几种请求方式(WebClient、WebRequest、HttpClient)
jvisualvm.exe 查看堆栈分配
 Hutool 常用的工具类和方法
 StringRedisTemplate 常用方法
原文地址：https://www.cnblogs.com/stimlee/p/7285785.html