目前日志的痛点
- 运维要经常登陆到服务器上拿日志给开发、测试
- 每次都是出问题后才去看日志,不能提前通过日志预判问题
- 如果是集群服务,日志将要从多台机器取
- 开发人员搞出来的日志不规范,没有标准。日志目录不统一、日志类型也不明确(系统日志、错误日志、访问日志、运行日志、设备日志、debug日志)
以上痛点可以使用ELK解决,
要想让日志发挥作用,要有4个阶段,
- 收集
- 存储
- 搜索和展现
- 日志分析,做到故障预警和业务拓展
使用 elasticsearch logstash kibana 可以解决前3个阶段的问题
es: 存储,搜索
logstash: 收集
kibanna: 展现
es 和 logstash都是使用java语言开发的,运行时使用jvm,所以运行环境要安装jdk(open-jdk,据说安卓系统将改用open-jdk,弃用sun-jdk,让安卓系统更轻一些)
es安装及配置
es安装的最佳实践是使用yum安装(也可以用源码安装,就是下载一个tar包,解压运行即可,好处是更新版本时很方便)
https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html
1.Download and install the public signing key:
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2.Create a file called elasticsearch.repo in the /etc/yum.repos.d/ directory for RedHat based distributions
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
3.And your repository is ready for use. You can now install Elasticsearch with one of the following
sudo yum install elasticsearch
配置:
es要配置的地方不多,集群cluster名称(很重要),节点名称(很重要),是否锁住内存, data path, log path ,监听网络的IP ,监听网络的接口
grep "[1]" /etc/elasticsearch/elasticsearch.yml
cluster.name: oldgirl
node.name: linux-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200
这里bootstrap.memory_lock: true 是锁内存,启动的时候会报错,导致服务无法启动,那是因为limit.conf没开启锁的权限按照日志报错提示进行添加
2018-07-01T14:15:44,143][WARN ][o.e.b.JNANatives ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2018-07-01T14:15:44,144][WARN ][o.e.b.JNANatives ] These can be adjusted by modifying /etc/security/limits.conf, for example:
# allow user 'elasticsearch' mlockall
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
[2018-07-01T14:15:44,144][WARN ][o.e.b.JNANatives
至此一个单节点的es安装完成,可以访问测试 http://IP:9200
{
"name" : "linux-node-1",
"cluster_name" : "oldgirl",
"cluster_uuid" : "5hmMNxc5QxG6q-2t2VNqrg",
"version" : {
"number" : "6.3.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "424e937",
"build_date" : "2018-06-11T23:38:03.357887Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
看到以上结果,说明一个es已经搭建成功,es搭建成功后接下来就是往es里存数据了。
如何和es交互?两种大的方法
一种是java API 一种是resful api
我们使用restfulapi,以json数据格式与es交互
比如在shell环境中执行:
curl -H Content-Type:application/json -i -X GET 'http://127.0.0.1:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}'
返回结果
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 114
{
"count" : 0,
"_shards" : {
"total" : 0,
"successful" : 0,
"skipped" : 0,
"failed" : 0
}
}
-X GET 请求的方法
加-i是把响应头显示出来
这里要加-H Content-Type:application/json ,告诉服务器用json格式解析请求数据,否则会报如下错误:
HTTP/1.1 406 Not Acceptable
content-type: application/json; charset=UTF-8
content-length: 109
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
这样使用shell命令行curl访问 es的restfulapi,但是不方便,es提供了很多插件,我们来使用官方推荐的插件,提供一个web管理的形式,来和es的restfulapi进行交互
官方推荐的插件在 elasticsearch 6.x版本 不在支持,我们用开源的elasticsearch-head github地址:https://github.com/mobz/elasticsearch-head
安装方法:
Running with built in server
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start
open http://localhost:9100/
然后去修改elasticsearch的配置文件
vim /etc/elasticsearch/elasticsearch.yml
最后添加如下两行
http.cors.enabled: true
http.cors.allow-origin: "*"
然后访问
打开http://localhost:9100/
添加http://localhost:9200
至此 我们就可以使用web方式与elasticsearch的restfulapi进行交互了
接下来就是做一个elasticsearch集群
安装都是一样的,就在配置文件里把cluster name 设置成一样 。
启动后es用多播或者组播 对外宣称自己是哪个集群的。这里要注意的是,多播形式在6.x版本不好用,建议使用组播。组播的配置方式
discovery.zen.ping.unicast.hosts: ["host1", "host2"] 这里最好填写ip
这里并不需要把所有的节点名称都添加进去,只需要添加1到2个。因为他们会传播的。
如何判断是否加入集群了,两种方式,一种看elasticsearch-head 概述里能看到。
另外一种是通过看elasticsearch的日志,日志的名称为集群的名称。
还有就是监控插件bigdesk 很可惜从2.0后就不支持了。还有一个kopf插件3.0也不支持,总之现在es在做平台化,我们这里学习了解即可,,生产尽量使用平台产品。少很多运维成本。
常用的插件就这3个,有2个已经不能使用了。
es集群安装配置成功后,基本的使用和概念了解后,我们就开始学习logstash ,es的使用有很多知识,但是对于我们运维来说,最重要的是收集日志,所以接下来重点学习logstash的使用。
logstash的安装
是不是要在每一台服务器上安装logstash,不一定如果通过网络收就不需要。要是收集文本文件,那就是了。
https://www.elastic.co/guide/en/logstash/current/installing-logstash.html
YUM
Download and install the public signing key:
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
Add the following in your /etc/yum.repos.d/ directory in a file with a .repo suffix, for example logstash.repo
vim /etc/yum.repos.d/logstash.repo
[logstash-6.x]
name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
And your repository is ready for use. You can install it with:
sudo yum install logstash
logstash使用gruby开发的。启动会有些慢
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{} }'
-e 执行
一个input 一个output
stdin{} ,stdout{} 是两个插件
运行需要等1分钟左右
[root@node2 elasticsearch]# /usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{} }'
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2018-07-01 15:03:59.682 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2018-07-01 15:04:00.629 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.3.0"}
[INFO ] 2018-07-01 15:04:03.885 [Converge PipelineAction::Create
The stdin plugin is now waiting for input:
[INFO ] 2018-07-01 15:04:04.098 [Converge PipelineAction::Create
[INFO ] 2018-07-01 15:04:04.225 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2018-07-01 15:04:04.547 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
hello world
{
"@version" => "1",
"@timestamp" => 2018-07-01T07:04:13.785Z,
"message" => "hello world",
"host" => "node2.shared"
}
hehehe
{
"@version" => "1",
"@timestamp" => 2018-07-01T07:04:20.411Z,
"message" => "hehehe",
"host" => "node2.shared"
}
以上就是标准输入输出的例子。
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{ codec => rubydebug } }'
...
hello
{
"message" => "hello",
"@version" => "1",
"@timestamp" => 2018-07-01T07:08:02.456Z,
"host" => "node2.shared"
}
我们把logstash进来的每条数据叫做事件,不叫一行 ,多行数据可能表示一个事件,比如 一个报错肯定不止一行信息。
把内容写到es中
输入还是用标准,输出改下
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { elasticsearch { hosts => ["10.211.55.8:9200"] } }'
相关官方文档https://www.elastic.co/guide/en/logstash/current/index.html
输出到es 就是那么简单。
能不能同时输出到es和前端,可以,不是负载均衡是同时。一个input,可以有多个output
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { elasticsearch { hosts => ["10.211.55.8:9200"] } stdout { codec => rubydebug } }'
什么作用呢? 生产上写到es的时候同时写到文本。文本保留是最好的,3个好处 1.最简单 2.可以2次加工 3. 压缩比最高 日志记什么好? 文本
接下来我们就要学习写logstash的配置文件,不能一直在命令行写,写到配置文件方便。
最简单的配置文件:
vim /etc/logstash/conf.d/logstash-simple.conf
input { stdin { } }
output {
elasticsearch { hosts => ["10.211.55.8:9200"] }
stdout { codec => rubydebug }
}
然后启动
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash-simple.conf
我们主要学习logstash的配置语法
This is a comment. You should use comments to describe
parts of your configuration.
input {
...
}
filter {
...
}
output {
...
}
input{},output{}是必须的,filter{}是可选的
input {
file {
path => "/var/log/messages"
type => "syslog"
}
file {
path => "/var/log/apache/access.log"
type => "apache"
}
}
案例 1
最常见的就是从文件输入
vim /etc/logstash/conf.d/file.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
接下来不仅收集系统日志 而且要收集java日志
案例 2
vim /etc/logstash/conf.d/file.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/elasticsearch/oldgirl.log"
type => "es-error"
start_position => "beginning"
}
}
output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
}
这样通过type 字段做if判断。
6.x中file插件文档没写type属性,但是能用,还不能换成其他的
这里要注意的是我们还没有给massge信息里做域,域中是有type属性的,那么这时候你再在file里使用type用于判断 那就会失效了。
当然也可以在一台服务器上 启动多个logstash程序去实现不同服务的日志。不过占用cpu和内存
Detected a 6.x and above cluster: the type
event field won't be used to determine the document _type {:es_version=>6}
启动时提示信息,告诉我们配置文件在file里设置的type并不是es 数据浏览中的_type
这样去elasticsearch中查看日志会有一个问题,就是一个错误信息 应该是一个事件,显示在一个事件里才是最好的,但是从文件里读取导致这个数据被切成了多行。这样是很不方便的。怎么把它收集到一个事件里呢。该引入codec了
案例3
input {
stdin {
codec => multiline {
pattern => "pattern, a regexp"
negate => "true" or "false"
what => "previous" or "next"
}
}
}
上面三个参数的解释
pattern 正则 ,在什么情况下和并
negate
what
input {
stdin {
codec => multiline {
pattern => "^["
negate => "true"
what => "previous"
}
}
}
output {
stdout {
codec => rubydebug
}
}
以[开头的为一个事件,不以[开头的就合并到上一个事件去
vim /etc/logstash/conf.d/all.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/elasticsearch/oldgirl.log"
type => "es-error"
start_position => "beginning"
codec => multiline {
pattern => "^["
negate => "true"
what => "previous"
}
}
}
output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
}
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/all.conf
接下来从elastic-head查看不方便,就要引用我们的kibana服务
kibana是elasticsearch的可视化平台
https://www.elastic.co/guide/en/kibana/current/index.html
kibana 一开始PHP,改为ruby 又改成gruby 现在改成nodejs
wget https://artifacts.elastic.co/downloads/kibana/kibana-6.3.0-linux-x86_64.tar.gz
shasum -a 512 kibana-6.3.0-linux-x86_64.tar.gz
tar -xzf kibana-6.3.0-linux-x86_64.tar.gz
mv kibana-6.3.0-linux-x86_64/ /usr/local/
cd /usr/local/
ln -s kibana-6.3.0-linux-x86_64/ kibana
更改kibana配置文件
cd /usr/local/kibana/config
vim kibana.yml
4个地方修改
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.url: "http://10.211.55.8:9200"
kibana.index: ".kibana"
kibana.index值得注意,kibana没有数据库,但数据总要又个地方存储,那么既然和es是生死之交,那就用es,直接告诉你帮我创建一个.kibana的索引,用来存储kibana数据
配置完成后,直接启动kibana
我们收集了system日志,java 的日志(es的运行日志),接下来我们收集nginx的日志。
es里有域的概念,域 可以理解成表中的字段 。 index 索引 理解成 数据库实例 ,_type 理解成数据库里的表,而域就是字段 即把 message里的内容 搞成key:value的形式
nginx 的日志 通过配置nginx.conf文件,可以让ngingx的日志格式统一输出为json文件格式。而logstash 传递给es,es可以直接把这种json数据格式解析成k:v的形式,这样将为以后使用elk中的kibana进行搜索增加效率。
nginx配置日志使用json的方式如下:nginx.org
http://nginx.org/en/docs/http/ngx_http_log_module.html 查看nginx官网的关于日志模块的配置
其中
Syntax: log_format name [escape=default|json|none] string ...;
Default: log_format combined "...";
Context: http
我们只需要在nginx中的http配置块中添加
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format json '{"@timestamp":"$time_iso8601",'
'"@version":"1",'
'"url":"$uri",'
'"status":"$status",'
'"domain":"$host",'
'"host":"$server_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"referer": "$http_referer",'
'"ua": "$http_user_agent"'
'}';
access_log /var/log/nginx/access_json.log json;
access_log /var/log/nginx/access.log main;
启动nginx,访问产生日志,并且确认是json格式的
此时写一个json.conf文件
vim /etc/logstash/conf.d/json.conf
input {
file {
path => "/var/log/nginx/access_json.log"
codec => json
}
}
output {
stdout {
codec => rubydebug
}
}
执行结果如下:
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/json.conf
[INFO ] 2018-07-01 22:22:36.797 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2018-07-01 22:22:37.539 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
{
"domain" => "10.211.55.8",
"@version" => "1",
"host" => "10.211.55.8",
"responsetime" => 0.0,
"@timestamp" => 2018-07-01T14:23:24.000Z,
"size" => 0,
"status" => "304",
"path" => "/var/log/nginx/access_json.log",
"ua" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"url" => "/index.html",
"referer" => "-"
}
接下来我们就可以添加到all.conf中了
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/nginx/access_json.log"
type => "nginx-log"
start_position => "beginning"
codec => json
}
file {
path => "/var/log/elasticsearch/oldgirl.log"
type => "es-error"
start_position => "beginning"
codec => multiline {
pattern => "^["
negate => "true"
what => "previous"
}
}
}
output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
if [type] == "nginx-log" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "nginx-log-%{+YYYY.MM.dd}"
}
}
}
这样就可以在elasticsearch-head中查看到新的index
在kibana中添加新的索引,然后就可以进行查询了
message日志的收集
前面我们也收集了message日志,但是我们使用的是file插件,
我们知道系统的日志是由syslog程序生成,syslog是可以将日志写到远程的
所以我们应该使用logstash 监听一个端口,syslog直接将日志写到监听端口就行了。
最好的是 生产上所有的业务都用syslog进行写日志,那就相当于 不需要在每台机器上安装logstash进行抓取日志,只需要搞一个logstash服务端口
nginx 也有支持写到syslog,原生的不支持,淘宝开源的支持,还有nginx lua 支持
在 input 插件列表中能找到syslog
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-syslog.html
vim /etc/logstash/conf.d/syslog.conf
input {
syslog {
type => "system-syslog"
host => "10.211.55.8"
port => "514"
}
}
output {
stdout {
codec => "rubydebug"
}
}
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/syslog.conf
启动后确认514端口是开放的
接下来就是更改系统的rsyslog.conf配置文件
vim /etc/rsyslog.conf
找到
. @@remote-host:514
去掉#改成:
. @@10.211.55.8:514
然后重启rsyslog服务
systemctl restart rsyslog
重启下你就会立马看到日志
{
"pid" => "20915",
"severity" => 5,
"logsource" => "node2",
"facility_label" => "security/authorization",
"timestamp" => "Jul 2 20:56:43",
"type" => "system-syslog",
"program" => "polkitd",
"@timestamp" => 2018-07-02T12:56:43.000Z,
"facility" => 10,
"host" => "10.211.55.8",
"@version" => "1",
"message" => "Unregistered Authentication Agent for unix-process:1927:9050003 (system bus name :1.1149, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnected from bus)
",
"priority" => 85,
"severity_label" => "Notice"
}
然后我们就可以把syslog.conf的配置写在all.conf配置文件中
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/nginx/access_json.log"
type => "nginx-log"
start_position => "beginning"
codec => json
}
file {
path => "/var/log/elasticsearch/oldgirl.log"
type => "es-error"
start_position => "beginning"
codec => multiline {
pattern => "^["
negate => "true"
what => "previous"
}
}
syslog {
type => "system-syslog"
host => "10.211.55.8"
port => "514"
}
}
output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
if [type] == "nginx-log" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "nginx-log-%{+YYYY.MM.dd}"
}
}
if [type] == "system-syslog" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "sysetm-syslog-%{+YYYY.MM.dd}"
}
}
}
启动后
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
进行测试
上面这个可以当作生产的模版。
还有一个常见的logstash插件 ,tcp插件
system-syslog可以监听syslog日志,假如有应用程序不想把日志写到文件中,就可以用logstash直接启动tcp监听端口
这样,程序可以将日志直接写到tcp监听端口。
写法如下:
vim tcp.conf
input {
tcp {
host => "10.211.55.8"
port => "6666"
}
}
output {
stdout {
codec => "rubydebug"
}
}
启动 /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/tcp.conf
然后用nc进行测试
nc 10.211.55.8 6666 < /etc/resolv.conf
{
"host" => "node2.shared",
"message" => "# Generated by NetworkManager",
"@timestamp" => 2018-07-02T13:20:27.921Z,
"port" => 44257,
"@version" => "1"
}
{
"host" => "node2.shared",
"message" => "search localdomain shared",
"@timestamp" => 2018-07-02T13:20:27.943Z,
"port" => 44257,
"@version" => "1"
}
{
"host" => "node2.shared",
"message" => "nameserver 10.211.55.1",
"@timestamp" => 2018-07-02T13:20:27.944Z,
"port" => 44257,
"@version" => "1"
}
echo "hehe" | nc 10.211.55.8 6666
{
"host" => "node2.shared",
"message" => "hehe",
"@timestamp" => 2018-07-02T13:21:39.242Z,
"port" => 44259,
"@version" => "1"
}
echo "oldgirl" > /dev/tcp/10.211.55.8/6666
{
"host" => "node2.shared",
"message" => "oldgirl",
"@timestamp" => 2018-07-02T13:23:23.936Z,
"port" => 44260,
"@version" => "1"
}
a-z ↩︎