• ELK-filter过滤器使用方法


    kibana自带grok插件工具

    处理日志读取,思路是:先分析日志信息是什么格式,以及日志规则需要filter里面的什么模块处理或者组合使用处理??

    官网地址

    https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html
    

    grok正则测试

    https://grokdebug.herokuapp.com/
    

    logstash的grok路径

    [root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
    
    -rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
    
    

    案例 非json格式日志

    192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
    

    通过grok正则获取

    %{IP:clientip} - - [(?<requesttime>[^ ]+ +d+)] "(?<requesttype>w+) (?<requesturl>[^ ]+) HTTP/d.d" (?<status>d+) (?<size>d+) "[^"]+" "(?<ua>[^"]+)"
    

    效果

    Grok提供的常用Patterns说明及举例

    大多数Linux使用人员都有过用正则表达式来查询机器中相关文件或文件里内容的经历,在Grok里,我们也是使用正则表达式来识别日志里的相关数据块。
      有两种方式来使用正则表达式:
    
      直接写正则来匹配
      用Grok表达式映射正则来匹配
      在我看来,每次重新写正则是一件很痛苦的事情,为什么不用表达式来一劳永逸呢?
      特别提示:Grok表达式很像C语言里的宏定义
      要学习Grok的默认表达式,我们就要找到它的具体配置路径,路径如下:
    # Windows下路径[你的logstash安装路径]vendorundlejrubyx.xgemslogstash-patterns-core-x.x.xpatternsgrok-patterns  现在对常用的表达式进行说明:
    

    常用表达式

      USERNAME 或 USER
      用户名,由数字、大小写及特殊字符(._-)组成的字符串
      比如:1234、Bob、Alex.Wong等
    
      EMAILLOCALPART
      电子邮件用户名部分,首位由大小写字母组成,其他位由数字、大小写及特殊字符(_.+-=:)组成的字符串。注意,国内的QQ纯数字邮箱账号是无法匹配的,需要修改正则
      比如:stone、Gary_Lu、abc-123等
    
      EMAILADDRESS
      电子邮件
      比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等
    
      HTTPDUSER
      Apache服务器的用户,可以是EMAILADDRESS或USERNAME
      INT
      整数,包括0和正负整数
      比如:0、-123、43987等
    
      BASE10NUM 或 NUMBER
      十进制数字,包括整数和小数
      比如:0、18、5.23等
    
      BASE16NUM
      十六进制数字,整数
      比如:0x0045fa2d、-0x3F8709等
    
      BASE16FLOAT
      十六进制数字,整数和小数
      WORD
      字符串,包括数字和大小写字母
      比如:String、3529345、ILoveYou等
    
      NOTSPACE
      不带任何空格的字符串
      SPACE
      空格字符串
      QUOTEDSTRING 或 QS
      带引号的字符串
      比如:"This is an apple"、'What is your name?'等
    
      UUID
      标准UUID
      比如:550E8400-E29B-11D4-A716-446655440000
    
      MAC
      MAC地址,可以是Cisco设备里的MAC地址,也可以是通用或者Windows系统的MAC地址
      IP
      IP地址,IPv4或IPv6地址
      比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等
    
      HOSTNAME
      主机名称
      IPORHOST
      IP或者主机名称
      HOSTPORT
      主机名(IP)+端口
      比如:127.0.0.1:3306、api.stozen.NET:8000等
    
      PATH
      路径,Unix系统或者Windows系统里的路径格式
      比如:/usr/local/nginx/sbin/nginx、c:windowssystem32clr.exe等
    
      URIPROTO
      URI协议
      比如:http、ftp等
    
      URIHOST
      URI主机
      比如:www.stozen.Net、10.0.0.1:22等
    
      URIPATH
      URI路径
      比如://www.stozen.net/abc/、/api.PHP等
    
      URIPARAM
      URI里的GET参数
      比如:?a=1&b=2&c=3
    
      URIPATHPARAM
      URI路径+GET参数
      比如://www.stozen.net/abc/api.php?a=1&b=2&c=3
    
      URI
      完整的URI
      比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3
    
    日期时间表达式
    
      MONTH
      月份名称
      比如:Jan、January等
    
      MONTHNUM
      月份数字
      比如:03、9、12等
    
      MONTHDAY
      日期数字
      比如:03、9、31等
    
      DAY
      星期几名称
      比如:Mon、Monday等
    
      YEAR
      年份数字
      HOUR
      小时数字
      MINUTE
      分钟数字
      SECOND
      秒数字
      TIME
      时间
      比如:00:01:23
    
      DATE_US
      美国日期格式
      比如:10-15-1982、10/15/1982等
    
      DATE_EU
      欧洲日期格式
      比如:15-10-1982、15/10/1982、15.10.1982等
    
      ISO8601_TIMEZONE
      ISO8601时间格式
      比如:+10:23、-1023等
    
      TIMESTAMP_ISO8601
      ISO8601时间戳格式
      比如:2016-07-03T00:34:06+08:00
    
      DATE
      日期,美国日期%{DATE_US}或者欧洲日期%{DATE_EU}
      DATESTAMP
      完整日期+时间
      比如:07-03-2016 00:34:06
    
      HTTPDATE
      http默认日期格式
      比如:03/Jul/2016:00:36:53 +0800
    
    Log表达式
    
      LOGLEVEL
      日志等级
      比如:Alert、alert、ALERT、Error等
    
    三、创建自己的Grok表达式
      在业务领域中,可能会有越来越多的日志格式出现在我们眼前,而Grok的默认表达式显然已无法满足我们的需求(比如用户身份证号、手机号等信息),所以,我们需要自己动手添加些表达式。
    表达式正则表达式说明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中国人习惯的日期格式ZIPCODE_CHS[1-9]d{5}国内邮政编码GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戏账号,首字符为字母,4-15位字母、数字、下划线组成  还有很多,需要您在业务中灵活运用!
    

    官方grok自带语法

    USERNAME [a-zA-Z0-9_-]+
    USER %{USERNAME}
    INT (?:[+-]?(?:[0-9]+))
    BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+)))
    NUMBER (?:%{BASE10NUM})
    BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
    BASE16FLOAT (?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:.[0-9A-Fa-f]*)?)|(?:.[0-9A-Fa-f]+)))
    
    POSINT (?:[1-9][0-9]*)
    NONNEGINT (?:[0-9]+)
    WORD w+
    NOTSPACE S+
    SPACE s*
    DATA .*?
    GREEDYDATA .*
    #QUOTEDSTRING (?:(?<!\)(?:"(?:\.|[^\"])*"|(?:'(?:\.|[^\'])*')|(?:`(?:\.|[^\`])*`)))
    QUOTEDSTRING (?:(?<!\)(?:"(?:\.|[^\"]+)*"|(?:'(?:\.|[^\']+)*')|(?:`(?:\.|[^\`]+)*`)))
    UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
    
    # Networking
    MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
    CISCOMAC (?:(?:[A-Fa-f0-9]{4}.){2}[A-Fa-f0-9]{4})
    WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
    COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
    IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
    HOSTNAME (?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|)
    HOST %{HOSTNAME}
    IPORHOST (?:%{HOSTNAME}|%{IP})
    HOSTPORT (?:%{IPORHOST=~/./}:%{POSINT})
    
    # paths
    PATH (?:%{UNIXPATH}|%{WINPATH})
    UNIXPATH (?:/(?:[w_%!$@:.,-]+|\.)*)+
    NUXTTY (?:/dev/pts/%{NONNEGINT})
    BSDTTY (?:/dev/tty[pq][a-z0-9])
    TTY (?:%{BSDTTY}|%{LINUXTTY})
    WINPATH (?:[A-Za-z]+:|\)(?:\[^\?*]*)+
    URIPROTO [A-Za-z]+(+[A-Za-z+]+)?
    URIHOST %{IPORHOST}(?::%{POSINT:port})?
    # uripath comes loosely from RFC1738, but mostly from what Firefox
    # doesn't turn into %XX
    URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+
    #URIPARAM ?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
    URIPARAM ?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]*
    URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
    URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
    
    # Months: January, Feb, 3, 03, 12, December
    MONTH (?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
    MONTHNUM (?:0?[1-9]|1[0-2])
    MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
    
    # Days: Monday, Tue, Thu, etc...
    DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
    
    # Years?
    YEAR [0-9]+
    # Time: HH:MM:SS
    #TIME d{2}:d{2}(?::d{2}(?:.d+)?)?
    # I'm still on the fence about using grok to perform the time match,
    # since it's probably slower.
    # TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:.%{POSINT})?)?
    HOUR (?:2[0123]|[01][0-9])
    MINUTE (?:[0-5][0-9])
    # '60' is a leap second in most time standards and thus is valid.
    SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)
    TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
    # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
    DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
    DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY}
    ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
    ISO8601_SECOND (?:%{SECOND}|60)
    TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
    DATE %{DATE_US}|%{DATE_EU}
    DATESTAMP %{DATE}[- ]%{TIME}
    TZ (?:[PMCE][SD]T)
    DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
    DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
    
    # Syslog Dates: Month Day HH:MM:SS
    SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
    PROG (?:[w._/%-]+)
    SYSLOGPROG %{PROG:program}(?:[%{POSINT:pid}])?
    SYSLOGHOST %{IPORHOST}
    SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}>
    HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE}
    
    # Shortcuts
    QS %{QUOTEDSTRING}
    
    # Log formats
    SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
    COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}
    
    # Log Levels
    LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![w*/
    

    案例 json格式日志

    {"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}
    

    通过json模块处理

    input {
      redis {
        data_type => "list"
        key => "qq-m44-nginx-log"
        host => "172.31.2.106"
        port => "6379"
        db => "3"
        password => "123456"
        codec => json
      }
    }
    
    # 过滤器
    filter {
      json {
        source => "message"
        remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"]
      }
      date {
            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
            target => "@timestamp"
        }
    }
    
    output {
      if [fields][app] == "nginx-errorlog" {
        elasticsearch {
          hosts => ["172.31.2.101:9200"]
          index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}"
      }}
    
      if [fields][app] == "nginx-accesslog" {
        elasticsearch {
          hosts => ["172.31.2.101:9200"]
          index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}"
      }}
    }
    

    访问nginx,终端输出效果

    {
               "agent" => {
                    "name" => "es-web1.example.local",
                    "type" => "filebeat",
            "ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83",
                 "version" => "7.12.1",
                "hostname" => "es-web1.example.local",
                      "id" => "51f9df27-4170-4844-ba12-c719de1f4410"
        },
              "domain" => "172.31.2.107",
              "status" => "304",
        "upstreamtime" => "-",
                "size" => 0,
                 "xff" => "-",
                 "ecs" => {
            "version" => "1.8.0"
        },
          "@timestamp" => 2021-08-29T05:31:29.000Z,
            "clientip" => "172.31.0.1",
             "referer" => "-",
        "responsetime" => 0.0,
        "upstreamhost" => "-",
           "http_host" => "172.31.2.107",
                 "url" => "/web/index.html",
                "host" => "172.31.2.107",
              "fields" => {
            "group" => "n125",
              "app" => "nginx-accesslog"
        }
    }
    

  • 相关阅读:
    C++——string转char[]
    Ackerman的非递归算法(未解决)
    单链表——递归求最大整数、节点个数、平均值
    队列——以数组Q[m]存放循环队列元素,设置一个标志tag,以tag=0和tag=1来区别在头指针和尾指针相等时,队列为空或满
    队列——假设以带头结点的循环链表表示队列,并且只设一个指针指向队尾元素结点(注意:不设头指针), * 试编写相应的置空队列、判断队列是否为空、入队和出队等算法。
    栈——判断回文
    栈——表达式求值
    栈——匹配()[]
    栈——十进制转八进制
    动态获取导航栏
  • 原文地址:https://www.cnblogs.com/xuanlv-0413/p/15374789.html
Copyright © 2020-2023  润新知