Flume案例：本地文件到HDFS

1）案例需求：实时监控Hive日志，并上传到HDFS中

2）需求分析：

3）实现步骤：

Flume要想将数据输出到HDFS，必须持有Hadoop相关jar包

将commons-configuration-1.6.jar、

hadoop-auth-2.7.2.jar、

hadoop-common-2.7.2.jar、

hadoop-hdfs-2.7.2.jar、

commons-io-2.4.jar、

htrace-core-3.1.0-incubating.jar拷贝到/opt/module/flume/lib文件夹下。

提示：标红的jar为1.99版本flume必须引用的jar。其他版本可以不引用。

创建flume-file-hdfs.conf文件

[jason@hadoop102 job]$ vim flume-file-hdfs.conf

添加如下内容

# Name the components on this agent

a2.sources = r2   #定义source

a2.sinks = k2     #定义sink

a2.channels = c2  #定义channel

 

# Describe/configure the source

a2.sources.r2.type = exec  #定义source类型为exec可执行命令

a2.sources.r2.command = tail -F /opt/module/hive/logs/hive.log

a2.sources.r2.shell = /bin/bash -c     #执行shell脚本的绝对路径

 

# Describe the sink

a2.sinks.k2.type = hdfs

a2.sinks.k2.hdfs.path = hdfs://hadoop102:9000/flume/%Y%m%d/%H

#上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-

#是否按照时间滚动文件夹
a2.sinks.k2.hdfs.round = true

#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1

#重新定义时间单位
a2.sinks.k2.hdfs.roundUnit = hour

#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true

#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000

#设置文件类型，可支持压缩
a2.sinks.k2.hdfs.fileType = DataStream

#多久生成一个新的文件
a2.sinks.k2.hdfs.rollInterval = 600

#设置每个文件的滚动大小
a2.sinks.k2.hdfs.rollSize = 134217700

#文件的滚动与Event数量无关
a2.sinks.k2.hdfs.rollCount = 0

#最小冗余数
a2.sinks.k2.hdfs.minBlockReplicas = 1

 

# Use a channel which buffers events in memory

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

 

# Bind the source and sink to the channel

a2.sources.r2.channels = c2

a2.sinks.k2.channel = c2

注：要想读取Linux系统中的文件，就得按照Linux命令的规则执行命令。由于hive日志在Linux系统中所以读取文件的类型选择：exec即execute执行的意思。表示执行Linux命令来读取文件。

4）执行监控配置

[jason@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf

5）开启hadoop和hive并操作hive产生日志

[jason@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

[jason@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

 

[jason@hadoop102 hive]$ bin/hive

hive (default)>

6）在HDFS上查看文件

相关阅读:
jquery.cookie.js插件一个小bug
MVC3的学习笔记
 枚举和位标志学习
 局域网中访问IIS站点
 文件内容比较
 未能加载文件或程序集“xxx”或它的某一个依赖项。生成此程序集的运行时比当前加载的运行时新，无法加载此程序集。
MSSQLSERVER服务不能启动
 初步学习lock的见解
 “base64 字符数组的无效长度”错误解决方案
 获取服务IP
原文地址：https://www.cnblogs.com/LIAOBO/p/13632307.html