【flume】5.采集日志进入hbase

设置我们的flume配置信息

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

agent1.sources = r1
agent1.channels = c1
agent1.sinks = s1

# For each one of the sources, the type is defined
agent1.sources.r1.type = exec
#tail -F /home/oss/cloud_iom/ktpt/iom-cloud-service/logs/iom-app-debug.log 
agent1.sources.r1.command = tail -F /home/oss/cloud_iom/ktpt/iom-cloud-service/logs/iom-app-debug.log

# The channel can be defined as follows.
#agent.sources.seqGenSrc.channels = memoryChannel
agent1.sources.r1.channels = c1

# Each sink's type must be defined
agent1.sinks.s1.type = hbase2

agent1.sinks.s1.table = iom_app_debug
agent1.sinks.s1.columnFamily = log
agent1.sinks.s1.serializer = org.apache.flume.sink.hbase2.RegexHBase2EventSerializer
#agent1.sinks.s1.serializer.regex = \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]\ \[(.*?)\]

#Specify the channel the sink should use
agent1.sinks.s1.channel = c1

# Each channel's type is defined.
agent1.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent1.channels.c1.capacity = 100

这个脚本配置好，设置启动命令,使用nohup是为了之后采集器自己后期自动运行

nohup flume-ng --conf hadoop/flume/conf -f hadoop/flume/conf/flume-conf.properties -n agent1 -Dflume.root.logger=DEBUG,console &

我的flume目录：

采集截图

日志文件截图

当然，这里是按行进行采集的（用的tail -F），但是shell脚本可以自己定义,只要type配置的是exec，后面sink对象也可以自己配置

第一步数据采集，第二步应该是想想如何进行数据分析，当然这样采集的数据直接分析的可能性也不太大，而且数据杂乱无序，我们还需要定义相应的逻辑先对数据进行清洗，然后再采集进去

这里这样采集是有问题的，正确的做法应该是

1.flume采集数据进入hdfs

2.MapReduce对采集进入的数据进行数据清洗，整理数据

3.MapReduce分析数据，解析入库进入hbase，或者直接保存到hdfs

4.sqoop 迁移数据到对应的数据库（mysql，Oracle）

5.根据解析之后的数据，查询Oracle制作报表图像，分析趋势，预测，或者定位问题关系

相关阅读:
SSH
Maven仓库
 java中的代理
 R 语言基本操作（基本信息的查看、与本地文件系统交互、编译器版本升级）
R 语言基本操作（基本信息的查看、与本地文件系统交互、编译器版本升级）
软件的版本命名管理
 软件的版本命名管理
 递归缩写
 递归缩写
 开源软件的许可（License）
原文地址：https://www.cnblogs.com/cutter-point/p/10869423.html