• 07_Flume_regex interceptor实践


     实践一:regex filter interceptor

    1、目标场景

    regex filter interceptor的作用:

    1)将event body的内容和配置中指定的正则表达式进行匹配
    2)如果内容匹配,则将该event丢弃
    3)如果内容不匹配,则将该event放行

    2、Flume Agent配置文件

    # 01 define agent name, source/sink/channel 
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # 02 source,http,jsonhandler
    a1.sources.r1.type = http
    a1.sources.r1.bind = master
    a1.sources.r1.port = 6666
    a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler
    
    # 03 regex filter interceptor, match event body for filter
    a1.sources.r1.interceptors = i1  
    a1.sources.r1.interceptors.i1.type = regex_filter  
    a1.sources.r1.interceptors.i1.regex = ^[0-9]*$ 
    # filter matched event 
    a1.sources.r1.interceptors.i1.excludeEvents = true  
    
    # 04 logger sink
    a1.sinks.k1.type = logger
    
    # 05 channel,memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # 06 bind source,sink to channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    3、验证regex filter interceptor

    1) 通过curl -X POST -d 'json数据' 发送带有不同body的HTTP请求,其中有1个满足regex

    2)观察终端打印出的event,body为1234的event被过滤, 并没有出现

     4、regex filter interceptor的官方文档

    实践二:regex extractor interceptor

    1、目标场景

    regex extractor interceptor的作用:
    1)将event body的内容和配置中指定的正则表达式进行匹配
    2)如果内容匹配,将配合配置文件中给定的key, 组成key:value添加到event的header中
    3)event body中的内容不会变化

    2、Flume Agent的配置文件

    # 01 define agent name, source/sink/channel 
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # 02 source,http,jsonhandler
    a1.sources.r1.type = http
    a1.sources.r1.bind = master
    a1.sources.r1.port = 6666
    a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler
    
    # 03 regex extractor interceptor,match event body to extract character and digital
    a1.sources.r1.interceptors = i1  
    a1.sources.r1.interceptors.i1.type = regex_extractor
    a1.sources.r1.interceptors.i1.regex = (^[a-zA-Z]*)\s([0-9]*$)  # regex匹配并进行分组,匹配结果将有两个部分, 注意s空白字符要进行转义
    # specify key for 2 matched part
    a1.sources.r1.interceptors.i1.serializers = s1 s2
    # key name
    a1.sources.r1.interceptors.i1.serializers.s1.name = word
    a1.sources.r1.interceptors.i1.serializers.s2.name = digital 
    
    # 04 logger sink
    a1.sinks.k1.type = logger
    
    # 05 channel,memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # 06 bind source,sink to channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    3、验证regex extractor interceptor

    1) 通过curl -X POST -d 'json数据'的方式发送HTTP请求,body中的内容为"shayzhang 1234", 其中shayzhang,1234将被正则表达式匹配

    2) 观察logger打印到终端的event,header中将增加两部分 word:shayzhang, digital:1234

  • 相关阅读:
    MathType如何插入竖直线
    MongoDB时间类型
    《穆斯林的葬礼》读书笔记
    Fluentd安装——通过rpm方式
    MongoDB安装、管理工具、操作
    Flask服务入门案例
    python判断类型
    linux硬链接与软链接
    python 环境问题
    Linux进程管理工具——supervisor
  • 原文地址:https://www.cnblogs.com/shay-zhangjin/p/7966452.html
Copyright © 2020-2023  润新知