需要一个配置文件 管理输入、过滤器和输出相关的配置。配置文件内容格式如下:
# 输入
input {
...
}
# 过滤器
filter {
...
}
# 输出
output {
...
}
先来看一个标准输入输出
root@c201b7b32a32# ./logstash -e 'input { stdin{} } output { stdout{} }'
Sending Logstash's logs to /opt/logstash/logs which is now configured via log4j2.properties
[2018-04-26T06:47:20,724][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/opt/logstash/modules/fb_apache/configuration"}
……
[2018-04-26T06:47:24,124][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x5fec99f4 run>"}
The stdin plugin is now waiting for input:
[2018-04-26T06:47:24,253][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
hello ==>输入 2018-04-26T06:47:31.957Z c201b7b32a32 hello ==>输出 this is test ==>输入
2018-04-26T06:50:29.743Z c201b7b32a32 this is test ==>输出
使用rubudebug显示详细输出,codec为一种编解码器
# ./logstash -e 'input { stdin{} } output { stdout{ codec => rubydebug} }'
test2 ==>输入
{
"message" => "test2",
"@timestamp" => 2018-04-26T07:00:00.652Z,
"@version" => "1",
"host" => "c201b7b32a32"
} ==>使用rubydebug输出
input输入设置
input {
# file为常用文件插件,插件内选项很多,可根据需求自行判断
file {
path => "/var/log/httpd/access_log" # 要导入的文件的位置,可以使用*,例如/var/log/nginx/*.log
Excude =>”*.gz” # 要排除的文件
start_position => "beginning" # 从文件开始的位置开始读,默认是end
ignore_older => 0 # 多久之内没修改过的文件不读取,0为无限制,单位为秒
sincedb_path => "/dev/null" # 记录文件上次读取位置;输出到null表示每次都从文件首行开始解析
add_field=>{"test"="test"} # 增加一个字段
type => "apache-log" # type字段,可表明导入的日志类型
}
}
也可以使用多个file
input {
file {
path => "/var/log/messages"
type => "syslog"
}
file {
path => "/var/log/apache/access.log"
type => "apache"
}
}
也可以使用数组方式 或者用*匹配
path => ["/var/log/messages","/var/log/*.log"]
path => ["/data/mysql/mysql.log"]
filter过滤设置
Logstash三个组件的第二个组件,也是真个Logstash工具中最复杂的一个组件,
当然,也是最有作用的一个组件。
1、grok插件 grok插件有非常强大的功能,他能匹配一切数据,但是他的性能和对资源的损耗同样让人诟病。
filter{ grok{ #首先要说明的是,所有文本数据都是在Logstash的message字段中的,我们要在过滤器里操作的数据就是message。
#只说一个match属性,他的作用是从message 字段中把时间给抠出来,并且赋值给另个一个字段logdate。
#第二点需要明白的是grok插件是一个十分耗费资源的插件。
#第三点需要明白的是,grok有超级多的预装正则表达式,这里是没办法完全搞定的,也许你可以从这个大神的文章中找到你需要的表达式
#http://blog.csdn.net/liukuan73/article/details/52318243
#但是,我还是不建议使用它,因为他完全可以用别的插件代替,当然,对于时间这个属性来说,grok是非常便利的。
match => ['message','%{TIMESTAMP_ISO8601:logdate}']
}
}
再看下match 另一种用法,将message中 ip、访问方法、url、数据量、持续时间 提取出来
并赋值给 clientip、method、request、bytes、duration 字段
filter { grok { match => {"message"=>"%{IPORHOST:clientip}s+%{WORD:method}s+%{URIPATHPARAM:request}s+%{NUMBER:bytes}s+%{NUMBER:duration}"} } }
显示数据
{ "message" => "9.9.8.6 GET /xx.hmtl 343 44", "@version" => "1", "@timestamp" => "2017-01-18T00:12:37.490Z", "path" => "/home/elk/0204/nginx.log", "host" => "db01", "type" => "nginx", "clientip" => "9.9.8.6", "method" => "GET", "request" => "/xx.hmtl", "bytes" => "343", "duration" => "44" }
继续修改,提取后删除message
filter { grok { match => {"message"=>"%{IPORHOST:clientip}s+%{WORD:method}s+%{URIPATHPARAM:request}s+%{NUMBER:bytes}s+%{NUMBER:duration}"} remove_field =>["message"] } }
显示结果
{ "@version" => "1", "@timestamp" => "2017-01-18T00:15:03.879Z", "path" => "/home/elk/0204/nginx.log", "host" => "db01", "type" => "nginx", "clientip" => "55.9.3.6", "method" => "GET", "request" => "/zz.xml", "bytes" => "3", "duration" => "44" }
比较常用的是 %{COMBINEDAPACHELOG} 是logstash自带的匹配模式,内置的正则,用来匹配apache access日志
filter {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
remove_field => "message"
}
}
显示结果
{ "_index": "logstash-2018.05.03", "_type": "apache_logs", "_id": "VFHkI2MBPZdRHaSpwnN-", "_version": 1, "_score": null, "_source": { "agent": ""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36 Maxthon/5.1.5.2000"", "path": "/var/log/httpd/access_log", "referrer": ""http://10.10.12.81/cacti/data_sources.php"", "host": "cacti", "verb": "GET", "clientip": "10.0.7.99", "request": "/cacti/graphs.php", "auth": "-", "@version": "1", "ident": "-", "httpversion": "1.1", "response": "200", "bytes": "37138", "@timestamp": "2018-05-03T02:46:26.477Z", "timestamp": "03/May/2018:10:46:25 +0800" }, "fields": { "@timestamp": [ "2018-05-03T02:46:26.477Z" ] }, "sort": [ 1525315586477 ] }
其它插件暂时不讲……
output输入设置
输出到elasticserarch
elasticsearch{
hosts=>["10.10.10.11:9200"] # elasticsearch 地址 端口
action=>"index" # 索引
index=>"indextemplate-logstash" # 索引名称
#document_type=>"%{@type}"
document_id=>"ignore"
template=>"/opt/logstash-conf/es-template.json" # 模板文件的路径
template_name=>"es-template.json" # 在es内部模板的名字
template_overwrite=>true #
protocol => "http" #目前支持三种协议 node、http 和tranaport
}
写几个实例
1.配置文件
input { file { path => ['/var/log/httpd/access_log'] start_position => "beginning" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } remove_field => "message" } } output { elasticsearch { hosts => ["10.10.15.95:9200"] index => "12.81-cacti-%{+YYYY.MM.dd}" action => "index" document_type => "apache_logs" } }
数据
{ "_index": "logstash-2018.05.03", "_type": "apache_logs", "_id": "U1HkI2MBPZdRHaSpMXPM", "_version": 1, "_score": 1, "_source": { "agent": ""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36 Maxthon/5.1.5.2000"", "path": "/var/log/httpd/access_log", "referrer": ""http://10.10.12.81/cacti/include/themes/modern/jquery-ui.css"", "host": "cacti", "verb": "GET", "clientip": "10.0.7.99", "request": "/cacti/include/themes/modern/images/ui-icons_454545_256x240.png", "auth": "-", "@version": "1", "ident": "-", "httpversion": "1.1", "response": "200", "bytes": "6992", "@timestamp": "2018-05-03T02:45:49.442Z", "timestamp": "03/May/2018:10:45:49 +0800" } }
2.一台机器上传输两种日志
input { file { path => "/var/log/messages" type => "system" start_position => "beginning" } file { path => "/var/log/elasticsearch/chuck-cluster.log" type => "es-error" start_position => "beginning" } } output { if [type] == "system" { elasticsearch { hosts => ["192.168.56.11:9200"] index => "system-%{+YYYY.MM.dd}" } } if [type] == "es-error" { elasticsearch { hosts => ["192.168.56.11:9200"] index => "es-error-%{+YYYY.MM.dd}" } } }
123