一.结构图
二.Flume常用的Type(参照官网总结)
2.1 Source:
名称 | 含义 | 注意点 |
---|---|---|
avro | avro协议的数据源 | |
exec | unix命令 | 可以命令监控文件 tail -F |
spooldir | 监控一个文件夹 | 不能含有子文件夹,不监控windows文件夹 处理完文件不能再写数据到文件 文件名不能冲突 |
TAILDIR | 既可以监控文件也可以监控文件夹 | 支持断点续传功能, 重点使用这个 |
netcat | 监听某个端口 | |
kafka | 监控卡夫卡数据 |
名称 | 含义 | 注意点 |
---|---|---|
kafka | 写到kafka中 | |
HDFS | 将数据写到HDFS中 | |
logger | 输出到控制台 | |
avro | avro协议 | 配合avro source使用 |
2.3 channel:
名称 | 含义 | 注意点 |
---|---|---|
memory | 存在内存中 | |
kafka | 将数据存到kafka中 | |
file | 存在本地磁盘文件中 |
三.此处依照第一点结构图分别搭建两个flume的agent(代理)
功能描述 :
代理A的source读取文件,由代理A的sink把文件的变化读取发送到代理B的source,在控制台把文件变化内容打印出。
3.1搭建代理A
vro-a.sources = a-source avro-a.sinks = avro-sinke avro-a.channels = memory-channel avro-a.sources.a-source.type =exec avro-a.sources.a-source.command = tail -f /var/log/httpd/access_log avro-a.sinks.avro-sinke.type = avro avro-a.sinks.avro-sinke.hostname= handoop01 avro-a.sinks.avro-sinke.port= 44444 avro-a.channels.memory-channel.type = memory avro-a.channels.memory-channel.capacity = 1000 avro-a.channels.memory-channel.transactionCapacity = 100 avro-a.sources.a-source.channels =memory-channel avro-a.sinks.avro-sinke.channel =memory-channel
3.2 搭建代理B
avro-logger.sources = avro-source avro-logger.sinks = logger-sink avro-logger.channels = memory-channel avro-logger.sources.avro-source.type= avro avro-logger.sources.avro-source.bind= 192.168.95.3 avro-logger.sources.avro-source.port= 44444 avro-logger.sinks.logger-sink.type = logger avro-logger.channels.memory-channel.type = memory avro-logger.channels.memory-channel.capacity = 1000 avro-logger.channels.memory-channel.transactionCapacity = 100 avro-logger.sources.avro-source.channels =memory-channel avro-logger.sinks.logger-sink.channel =memory-channel
3.3 启动配置完毕后需注意是flumeB监控的是flumeA,所以需要按顺序开启,先开启flumeB,后再启动flumeA
flume-ng agent --name avro-logger --conf conf --conf-file conf/flume-B-avro.properties
- [root@handoop01 apache-flume-1.8.0]# flume-ng agent --name avro-logger --conf conf --conf-file conf/flume-B-avro.properties Info: Including Hive libraries found via () for Hive access - exec /usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/bin/java -Xmx20m -cp '/opt/flume/apache-flume-1.8.0/conf:/opt/flume/apache-flume-1.8.0/lib/:/lib/' -Djava.library.path= org.apache.flume.node.Application --name avro-logger --conf-file conf/flume-B-avro.properties 2019-10-08 11:48:34,722 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting 2019-10-08 11:48:34,731 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:conf/flume-B-avro.properties 2019-10-08 11:48:34,747 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:logger-sink 2019-10-08 11:48:34,748 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: logger-sink Agent: avro-logger 2019-10-08 11:48:34,748 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:logger-sink 2019-10-08 11:48:34,810 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [avro-logger] 2019-10-08 11:48:34,810 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels 2019-10-08 11:48:34,817 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel memory-channel type memory 2019-10-08 11:48:34,864 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel memory-channel 2019-10-08 11:48:34,865 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source avro-source, type avro 2019-10-08 11:48:34,905 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: logger-sink, type: logger 2019-10-08 11:48:34,907 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel memory-channel connected to [avro-source, logger-sink] 2019-10-08 11:48:34,946 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{avro-source=EventDrivenSourceRunner: { source:Avro source avro-source: { bindAddress: 192.168.95.3, port: 44444 } }} sinkRunners:{logger-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@29d0ad79 counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} } 2019-10-08 11:48:34,952 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel memory-channel 2019-10-08 11:48:34,964 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:159)] Waiting for channel: memory-channel to start. Sleeping for 500 ms 2019-10-08 11:48:35,300 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: memory-channel: Successfully registered new MBean. 2019-10-08 11:48:35,309 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: memory-channel started 2019-10-08 11:48:35,465 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink logger-sink 2019-10-08 11:48:35,468 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source avro-source 2019-10-08 11:48:35,483 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:234)] Starting Avro source avro-source: { bindAddress: 192.168.95.3, port: 44444 }... 2019-10-08 11:48:36,330 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: avro-source: Successfully registered new MBean. 2019-10-08 11:48:36,330 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: avro-source started 2019-10-08 11:48:36,338 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:260)] Avro source avro-source started. 2019-10-08 11:52:21,347 (New I/O server boss #3) INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171) OPEN 2019-10-08 11:52:21,348 (New I/O worker #1) INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171) BOUND: /192.168.95.3:44444 2019-10-08 11:52:21,349 (New I/O worker #1) INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171) CONNECTED: /192.168.95.3:60824 2019-10-08 11:52:27,424 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 6C 65 65 70 79 68 65 61 64 sleepyhead } 2019-10-08 11:53:29,440 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 6F 75 72 63 65 2C 73 69 6E 6B 2C 63 68 61 6E source,sink,chan }
flume-ng agent --name avro-a --conf conf --conf-file conf/flume-A-avro.properties
[root@handoop01 apache-flume-1.8.0]# flume-ng agent --name avro-a --conf conf --conf-file conf/flume-A-avro.properties Info: Including Hive libraries found via () for Hive access - exec /usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/bin/java -Xmx20m -cp '/opt/flume/apache-flume-1.8.0/conf:/opt/flume/apache-flume-1.8.0/lib/:/lib/' -Djava.library.path= org.apache.flume.node.Application --name avro-a --conf-file conf/flume-A-avro.properties 2019-10-08 11:52:20,312 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting 2019-10-08 11:52:20,320 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:conf/flume-A-avro.properties 2019-10-08 11:52:20,350 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:avro-sinke 2019-10-08 11:52:20,350 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:avro-sinke 2019-10-08 11:52:20,351 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: avro-sinke Agent: avro-a 2019-10-08 11:52:20,351 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:avro-sinke 2019-10-08 11:52:20,352 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:avro-sinke 2019-10-08 11:52:20,385 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [avro-a] 2019-10-08 11:52:20,385 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels 2019-10-08 11:52:20,394 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel memory-channel type memory 2019-10-08 11:52:20,399 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel memory-channel 2019-10-08 11:52:20,414 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source a-source, type exec 2019-10-08 11:52:20,421 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: avro-sinke, type: avro 2019-10-08 11:52:20,461 (conf-file-poller-0) [INFO - org.apache.flume.sink.AbstractRpcSink.configure(AbstractRpcSink.java:183)] Connection reset is set to 0. Will not reset connection to next hop 2019-10-08 11:52:20,462 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel memory-channel connected to [a-source, avro-sinke] 2019-10-08 11:52:20,496 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{a-source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:a-source,state:IDLE} }} sinkRunners:{avro-sinke=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1362592d counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} } 2019-10-08 11:52:20,499 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel memory-channel 2019-10-08 11:52:20,500 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:159)] Waiting for channel: memory-channel to start. Sleeping for 500 ms 2019-10-08 11:52:20,629 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: memory-channel: Successfully registered new MBean. 2019-10-08 11:52:20,629 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: memory-channel started 2019-10-08 11:52:21,002 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink avro-sinke 2019-10-08 11:52:21,005 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source a-source 2019-10-08 11:52:21,008 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:168)] Exec source starting with command: tail -f /var/log/httpd/access_log 2019-10-08 11:52:21,009 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: a-source: Successfully registered new MBean. 2019-10-08 11:52:21,009 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: a-source started 2019-10-08 11:52:21,025 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:287)] Starting RpcSink avro-sinke { host: handoop01, port: 44444 }... 2019-10-08 11:52:21,026 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: avro-sinke: Successfully registered new MBean. 2019-10-08 11:52:21,027 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: avro-sinke started 2019-10-08 11:52:21,027 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:205)] Rpc sink avro-sinke: Building RpcClient with hostname: handoop01, port: 44444 2019-10-08 11:52:21,028 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:126)] Attempting to create Avro Rpc client. 2019-10-08 11:52:21,202 (lifecycleSupervisor-1-1) [WARN - org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:634)] Using default maxIOWorkers 2019-10-08 11:52:21,721 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:301)] Rpc sink avro-sinke started.
3.4 往/var/log/httpd/access_log文件中添加内容测试
flumeB打印效果: