• Flume+Kafka+SparkStreaming+Hbase+可视化(一)


    一、前置准备:
    Linux命令基础
    Scala、Python其中一门
    Hadoop、Spark、Flume、Kafka、Hbase基础知识
     
    二、分布式日志收集框架Flume
    业务现状分析:服务器、web服务产生的大量日志,怎么使用,怎么将大量日志导入到集群
    1、shell脚本批量,再传到Hdfs:实效性不高,容错率低,网络/磁盘IO,监控
    2、Flume:
    Flume:关键在于写配置文件
    1)配置 agent
    2)配置 Source
    3)配置 Channel
    4)配置 Sink
    1-netcat-mem-logger.conf :监听端口数据

    #example for source=netcat, channel=memory, sink=logger
    # Name the components on this agent
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    
    # configure for sources
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = localhost
    a1.sources.r1.port = 44444
    
    # configure for channels
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # configure for sinks
    a1.sinks.k1.type = logger
    
    # configure 
    a1.sinks.k1.channel = c1
    a1.sources.r1.channels = c1
    启动 flume-ng agent \
    -n a1 \
    -c conf -f ./1-netcat-mem-logger.conf \
    -Dflume.root.logger=INFO,console
     
    exec-mem-logger.conf :监控文件
    # Name the components on this agent
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    
    # configure for sources
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /opt/datas/flume_data/exec_tail.log
    
    # configure for channels
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # configure for sinks
    a1.sinks.k1.type = logger
    a1.sinks.k1.channel = c1
    a1.sources.r1.channels = c1
    
    flume-ng agent \
    -n a1 \
    -c conf -f ./4-exec-mem-logger.conf \
    -Dflume.root.logger=INFO,console

    日志收集过程:
    1. 日志服务器,启动agent,exec-source, memory-channel,avro-sink(数据服务器), 将收集到的日志数据,写到数据服务器
    2. 数据服务器,启动agent,avro-aource,memory-channel,logger-sink/kafka-sink

    conf1:exec-mem-avro.conf

    # Name the components on this agent
    a1.sources = exec-source
    a1.channels = memory-channel
    a1.sinks = avro-sink
    
    # configure for sources
    a1.sources.exec-source.type = exec
    a1.sources.exec-source.command = tail -F /opt/datas/log-collect-system/log_server.log
    
    # configure for channels
    a1.channels.memory-channel.type = memory
    a1.channels.memory-channel.capacity = 1000
    a1.channels.memory-channel.transactionCapacity = 100
    
    # configure for sinks
    a1.sinks.avro-sink.type = avro
    a1.sinks.avro-sink.hostname = localhost
    a1.sinks.avro-sink.port = 44444
    
    # configure 
    a1.sinks.avro-sink.channel = memory-channel
    a1.sources.exec-source.channels = memory-channel
    
    conf2:avro-mem-logger.conf

    # Name the components on this agent
    a1.sources = avro-source
    a1.channels = memory-channel
    a1.sinks = logger-sink
    
    # configure for sources
    a1.sources.avro-source.type = avro
    a1.sources.avro-source.bind = localhost
    a1.sources.avro-source.port = 44444
    
    # configure for channels
    a1.channels.memory-channel.type = memory
    a1.channels.memory-channel.capacity = 1000
    a1.channels.memory-channel.transactionCapacity = 100
    
    # configure for sinks
    a1.sinks.logger-sink.type = logger
    
    # configure 
    a1.sinks.logger-sink.channel = memory-channel
    a1.sources.avro-source.channels = memory-channel
    (非常重要!!!)启动顺序:先启动exec-mem-avro.conf再启动exec-mem-avro.conf

  • 相关阅读:
    [React]核心概念
    [算法]复杂度分析
    [算法]移除指定元素&strSr()的实现
    [算法]合并链表&删除数组重复项
    php _weakup()反序列化漏洞
    Java 注解详解
    MyBatis入门
    Spring 事务管理
    Spring AOP
    Spring JDBC
  • 原文地址:https://www.cnblogs.com/mlxx9527/p/9367495.html
Copyright © 2020-2023  润新知