以下是实例

原始数据:

{"countnum":2,"checktime":"2017-05-23 16:59:32"}

{"countnum":2,"checktime":"2017-05-23 16:59:32"}  

1、无涉及字段类型转换   logstash filter  配置如下参数即可

if [type] == "onlinecount" {

       json{

    source => "message"

     }

  }

2、涉及字段类型转换

logstash filter  

if [type] == "onlinecount" {

mutate{

split=>["message",","]

add_field => {

"coutnum" => "%{[message][0]}"

}

add_field => {

"checktime" => "%{[message][1]}"

}

remove_field => ["message"]

}

json{

source => "coutnum"

source => "checktime"

#convert => { "coutnum" => "integer" }

target => "coutnum"

target => "checktime"

}

}


kafka数据:{
{"cluster":"qy_api_v2_pool","body_bytes_sent":"8579","http_versioncode":"Android_32"}
{"cluster":"qy_api_v2_pool","body_bytes_sent":"8579","http_versioncode":"Android_33"}
{"cluster":"qy_api_v2_pool","body_bytes_sent":"8579","http_versioncode":"Android_34"}
....
}
 

kafka团队因考虑性能问题,将原始日志多条合并一条发送(每一条用换行符分割),这样我读的kafka就必须拆成一条一条的写入到ES,不然数据就不准确了,请问这种需求该如何处理呢?

已解决,开始走了弯路,用的下列方法导致还在一条数据
filter {
      mutate {
              split=>["message","
"]
      }


正解方案
filter {
        split {
                        field => "message"
               }


 
还有一个小问题split中terminator默认是 ,但是我如下写法为什么切割不成功呢,不写terminator是可以的
filter {
        split {
                        field => "message"
                        terminator => "\n"
               }
 


现有json:

{
"name":"zhangsan",
"friends":
{
"friend1":"lisi",
"friend2":"wangwu",
"msg":["haha","yaya"]
}
}
1
2
3
4
5
6
7
8
9
将其解析为:

{
"name":"zhangsan",
"friend1":"lisi",
"friend2":"wangwu",
"msg":["haha","yaya"]
}
1
2
3
4
5
6
logstash.conf

input
{
stdin
{
codec => json
}
}

filter
{
mutate
{
add_field => { "@friends" => "%{friends}" } #先新建一个新的字段,并将friends赋值给它
}
json
{
source => "@friends" #再进行解析
remove_field => [ "@alert","alert" ] #删除不必要的字段,也可以不用这语句
}
}

output
{
stdout { }
}
---------------------
作者:姚贤贤
来源:CSDN
原文:https://blog.csdn.net/u011311291/article/details/86743642
版权声明:本文为博主原创文章,转载请附上博文链接!


由于我们的埋点日志是嵌套json类型,要想最终所有字段展开来统计分析就必须把嵌套json展开。

  1. 日志格式如下:
2019-01-22 19:25:58 172.17.12.177  /statistics/EventAgent appkey=yiche&enc=0&ltype=view&yc_log={"uuid":"73B333EB-EC87-4F9F-867B-A9BF38CBEBB2","mac":"02:00:00:00:00:00","uid":-1,"idfa":"2BFD67CF-ED60-4CF6-BA6E-FC0B18FDDDF8","osv":"iOS11.4.1","fac":"apple","mdl":"iPhone SE","req_id":"360C8C43-73AC-4429-9E43-2C08F4C1C425","itime":1548156351820,"os":"2","sn_id":"6B937D83-BFB2-4C22-85A8-5B3E82D9D0F1","dvid":"3676b52dc155e1eec3ca514f38736fd6","aptkn":"4fb9b2bffb808515aa0e9a5f5b17d826769e432f63d5cf87f7fb5ce4d67ef9f1","cha":"App Store","idfv":"B1EAD56F-E456-4FF2-A3C2-9A8FA0693C22","nt":4,"lg_vl":{"pfrom":"shouye","ptitle":"shouye"},"av":"10.3.3"}   218.15.255.124  200
  1. 最开始Logstash的配置文件如下:
input {
  file {
    path => ["/data/test_logstash.log"]
    type => ["nginx_log"]
    start_position => "beginning"
  }
}
filter {
  if [type] =~ "nginx_log" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:create_time} %{IP:server_ip}  %{URIPATH:uri} %{GREEDYDATA:args}   %{IP:client_ip}  %{NUMBER:status}" }
    }
    urldecode{
    field =>args
    }
    kv {
    source =>"args"
    field_split =>"&"
    remove_field => [ "args","@timestamp","message","path","@version","path","host" ]
    }
    json {
        source => "yc_log"
        remove_field => [ "yc_log" ]
    }
  }
}
output {
  stdout { codec => rubydebug }
}

按照以上配置文件运行Logstash得到的结果如下:

{
      "server_ip" => "172.17.12.177",
            "cha" => "App Store",
            "mdl" => "iPhone SE",
           "type" => "nginx_log",
            "mac" => "02:00:00:00:00:00",
         "ptitle" => "shouye",
         "appkey" => "yiche",
           "idfv" => "B1EAD56F-E456-4FF2-A3C2-9A8FA0693C22",
          "sn_id" => "6B937D83-BFB2-4C22-85A8-5B3E82D9D0F1",
          "aptkn" => "4fb9b2bffb808515aa0e9a5f5b17d826769e432f63d5cf87f7fb5ce4d67ef9f1",
             "av" => "10.3.3",
             "os" => "2",
           "idfa" => "2BFD67CF-ED60-4CF6-BA6E-FC0B18FDDDF8",
            "uid" => -1,
           "uuid" => "73B333EB-EC87-4F9F-867B-A9BF38CBEBB2",
         "req_id" => "360C8C43-73AC-4429-9E43-2C08F4C1C425",
         "status" => "200",
            "uri" => "/statistics/EventAgent",
            "enc" => "0",
          "ltype" => "view",
          "lg_vl" => {
        "ptitle" => "shouye",
         "pfrom" => "shouye"
    },
             "nt" => 4,
          "pfrom" => "shouye",
          "itime" => 1548156351820,
      "client_ip" => "218.15.255.124",
    "create_time" => "2019-01-22 19:25:58",
           "dvid" => "3676b52dc155e1eec3ca514f38736fd6",
            "fac" => "apple",
       "lg_value" => "{"pfrom":"shouye","ptitle":"shouye"}",
            "osv" => "iOS11.4.1"
}

可以看到lg_vl字段仍然是json格式,没有解析出来。如果直接在配置文件中添加

json { source => "lg_vl" }

会报jsonParseException错。

  1. 正确做法
input {
  file {
    path => ["/data/test_logstash.log"]
    type => ["nginx_log"]
    start_position => "beginning"
  }
}
filter {
  if [type] =~ "nginx_log" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:create_time} %{IP:server_ip}  %{URIPATH:uri} %{GREEDYDATA:args}   %{IP:client_ip}  %{NUMBER:status}" }
    }
    urldecode{
    field =>args
    }
    kv {
    source =>"args"
    field_split =>"&"
    remove_field => [ "args","@timestamp","message","path","@version","path","host" ]
    }
    json {
        source => "yc_log"
        remove_field => [ "yc_log" ]
    }
    mutate {
      add_field => { "lg_value" => "%{lg_vl}" }
    }
    json {
        source => "lg_value"
        remove_field => [ "lg_vl","lg_value" ]
    }
  }
}

output {
  stdout { codec => rubydebug }
}

在解析完上一层json之后添加一个字段lg_value,再将lg_vl的内容赋值给lg_value;之后单独对lg_value进行json解析就可以了。解析完结果如下:

{
           "type" => "nginx_log",
             "nt" => 4,
           "dvid" => "3676b52dc155e1eec3ca514f38736fd6",
             "os" => "2",
            "fac" => "apple",
          "ltype" => "view",
      "client_ip" => "218.15.255.124",
          "itime" => 1548156351820,
            "mac" => "02:00:00:00:00:00",
           "idfa" => "2BFD67CF-ED60-4CF6-BA6E-FC0B18FDDDF8",
            "uri" => "/statistics/EventAgent",
          "aptkn" => "4fb9b2bffb808515aa0e9a5f5b17d826769e432f63d5cf87f7fb5ce4d67ef9f1",
          "sn_id" => "6B937D83-BFB2-4C22-85A8-5B3E82D9D0F1",
    "create_time" => "2019-01-22 19:25:58",
            "osv" => "iOS11.4.1",
         "req_id" => "360C8C43-73AC-4429-9E43-2C08F4C1C425",
         "ptitle" => "shouye",
             "av" => "10.3.3",
      "server_ip" => "172.17.12.177",
          "pfrom" => "shouye",
            "enc" => "0",
            "mdl" => "iPhone SE",
            "cha" => "App Store",
           "idfv" => "B1EAD56F-E456-4FF2-A3C2-9A8FA0693C22",
            "uid" => -1,
           "uuid" => "73B333EB-EC87-4F9F-867B-A9BF38CBEBB2",
         "appkey" => "yiche",
         "status" => "200"
}

完美,棒棒哒!!!

 


作者:神秘的寇先森
链接:https://www.jianshu.com/p/de06284e1484
来源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。

Logstash替换字符串,解析json数据,修改数据类型,获取日志时间

 

在某些情况下,有些日志文本文件类json,但它的是单引号,具体格式如下,我们需要根据下列日志数据,获取正确的字段和字段类型

{'usdCnyRate': '6.728', 'futureIndex': '463.36', 'timestamp': '1532933162361'}
{'usdCnyRate': '6.728', 'futureIndex': '463.378', 'timestamp': '1532933222335'}
{'usdCnyRate': '6.728', 'futureIndex': '463.38', 'timestamp': '1532933348347'}
{'usdCnyRate': '6.728', 'futureIndex': '463.252', 'timestamp': '1532933366866'}
{'usdCnyRate': '6.728', 'futureIndex': '463.31', 'timestamp': '1532933372350'}
{'usdCnyRate': '6.728', 'futureIndex': '463.046', 'timestamp': '1532933426899'}
{'usdCnyRate': '6.728', 'futureIndex': '462.806', 'timestamp': '1532933432346'}
{'usdCnyRate': '6.728', 'futureIndex': '462.956', 'timestamp': '1532933438353'}
{'usdCnyRate': '6.728', 'futureIndex': '462.954', 'timestamp': '1532933456796'}
{'usdCnyRate': '6.728', 'futureIndex': '462.856', 'timestamp': '1532933492411'}
{'usdCnyRate': '6.728', 'futureIndex': '462.776', 'timestamp': '1532933564378'}
{'usdCnyRate': '6.728', 'futureIndex': '462.628', 'timestamp': '1532933576849'}
{'usdCnyRate': '6.728', 'futureIndex': '462.612', 'timestamp': '1532933588338'}
{'usdCnyRate': '6.728', 'futureIndex': '462.718', 'timestamp': '1532933636808'}

此时我们如果当json直接用logstash Json filter plugin来解析会如下报错

[WARN ] 2018-07-31 10:20:12.708 [Ruby-0-Thread-5@[main]>worker1: :1] json - Error parsing json {:source=>"message", :raw=>"{'usdCnyRate': '6.728', 'futureIndex': '462.134', 'timestamp': '1532933714371'}", :exception=>#<LogStash::Json::ParserError: Unexpected character (''' (code 39)): was expecting double-quote to start field name at [Source: (byte[])"{'usdCnyRate': '6.728', 'futureIndex': '462.134', 'timestamp': '1532933714371'}"; line: 1, column: 3]>}

此处我认为简单的做法是替换单引号为双引号,替换过程应用了logstash mutate gsub
一定要看清楚我10-12行的写法,作用为替换字符串,14-15行为解析json。我们还需要将usdCnyRate和futureIndex转为float类型(18-21行),将timestamp转为时间类型,并重新定义一个logdate来存储(23-25行)此处用到
logstash date filter plugin

input{
    file {
        path => "/usr/share/logstash/wb.cond/test.log"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
filter{
    mutate {
        gsub =>[
            "message", "'", '"'
        ]
    }
    json {
        source => "message"
    }
    mutate {
        convert => {
            "usdCnyRate" => "float"
            "futureIndex" => "float"
        }
    }
    date {
        match => [ "timestamp", "UNIX_MS" ]
        target => "logdate"
    }
}
output{
    stdout{
        codec=>rubydebug
    }
}

利用上述配置文件,我们能正确解析出日志文件的字段和类型

{
        "message" => "{"usdCnyRate": "6.728", "futureIndex": "463.378", "timestamp": "1532933222335"}",
     "@timestamp" => 2018-07-31T10:48:48.600Z,
           "host" => "logstashvm0",
           "path" => "/usr/share/logstash/wb.cond/test.log",
       "@version" => "1",
        "logdate" => 2018-07-30T06:47:02.335Z,
     "usdCnyRate" => 6.728,
      "timestamp" => "1532933222335",
    "futureIndex" => 463.378
}
{
        "message" => "{"usdCnyRate": "6.728", "futureIndex": "463.252", "timestamp": "1532933366866"}",
     "@timestamp" => 2018-07-31T10:48:48.602Z,
           "host" => "logstashvm0",
           "path" => "/usr/share/logstash/wb.cond/test.log",
       "@version" => "1",
        "logdate" => 2018-07-30T06:49:26.866Z,
     "usdCnyRate" => 6.728,
      "timestamp" => "1532933366866",
    "futureIndex" => 463.252
}
{
        "message" => "{"usdCnyRate": "6.728", "futureIndex": "463.31", "timestamp": "1532933372350"}",
     "@timestamp" => 2018-07-31T10:48:48.602Z,
           "host" => "logstashvm0",
           "path" => "/usr/share/logstash/wb.cond/test.log",
       "@version" => "1",
        "logdate" => 2018-07-30T06:49:32.350Z,
     "usdCnyRate" => 6.728,
      "timestamp" => "1532933372350",
    "futureIndex" => 463.31
}
欢迎转载,注明出处。有任何问题和建议,欢迎留言讨论,也可以发我邮箱wenbya@outlook.com