普罗米修斯Prometheus监控安装
架构:
服务端:192.168.0.204
客户端:192.168.0.206
环境准备:所有节点安装go 语言环境
rz go1.12.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.12.linux-amd64.tar.gz
cat >> /etc/profile<<EOF
export PATH=$PATH:/usr/local/go/bin
EOF
source /etc/profile
go version
1、server端部署
1.1、 软件包准备
cd /usr/local/src
wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz #服务端、客户端都部署
wget https://github.com/prometheus/prometheus/releases/download/v2.7.1/prometheus-2.7.1.linux-amd64.tar.gz #服务端部署
tar xf prometheus-2.7.1.linux-amd64.tar.gz
tar xf node_exporter-0.17.0.linux-amd64.tar.gz
1.2 启动node_exporter
# 验证以Prometheus本身数据为例,在Web中查询指定表达式及图形化显示查询结果 。
mv prometheus-2.7.1.linux-amd64 /usr/local
mv node_exporter-0.17.0.linux-amd64 /usr/local/
ln -s /usr/local/prometheus-2.7.1.linux-amd64/ /usr/local/prometheus
ln -s /usr/local/node_exporter-0.17.0.linux-amd64/ /usr/local/node_exporter
cd /usr/local/node_exporter
./node_exporter &
netstat -lntp|grep 9100
http://192.168.0.204:9100/metrics
1.3 启动Prometheus
cd /usr/local/prometheus
vi prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'codelab-monitor'
rule_files:
- 'prometheus_rules.yml' #需定义
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
labels:
alias: prometheus
- job_name: 'linux1'
static_configs:
- targets: ['192.168.0.204:9100'] #安装node_node_exporter的节点ip地址
labels:
alias: linux-node1
- job_name: 'linux2'
static_configs:
- targets: ['192.168.0.206:9100'] #安装node_node_exporter的节点ip地址
labels:
alias: linux-node2
##############################################################
#添加alert规则
cat>>prometheus_rules.yml<<EOF
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# Alert for any instance that has a median request latency >1s.
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
EOF
启动Prometheus
cd /usr/local/prometheus
./prometheus
浏览器访问
http://192.168.0.204:9090/targets
2、客户端部署
2.1 部署node_exporter
使用Prometheus Web来验证客户端Node Exporter的数据的采集。内存、CPU负载,磁盘等性能监控
wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz #客户端部署,可针对硬件层次进行监控
tar xf node_exporter-0.17.0.linux-amd64.tar.gz
mv node_exporter-0.17.0.linux-amd64 /usr/local/
ln -s /usr/local/node_exporter-0.17.0.linux-amd64/ /usr/local/node_exporter
cd /usr/local/node_exporter
./node_exporter &
netstat -lntp|grep 9100
http://192.168.0.206:9100/metrics #自定义Metrics
拦截器/过滤器:用于统计所有应用请求的情况
自定义Collector: 可以用于统计应用业务能力相关的监控情况
2.3、对mysql进行监控(没做)
https://www.hi-linux.com/posts/27014.html #可参考
cd /usr/local/src/
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.10.0/mysqld_exporter-0.10.0.linux-amd64.tar.gz #部署在mysql服务器上,node_exporter也部署(参考前面)
tar xf mysqld_exporter-0.10.0.linux-amd64.tar.gz
mv mysqld_exporter-0.10.0.linux-amd64 /usr/local/
ln -s /usr/local/mysqld_exporter-0.10.0.linux-amd64/ /usr/local/mysqld_exporter
加载mysqld_exporter 添加配置文件(需要MySQL授权用户)
mysqld_exporter需要连接到MySQL,需要授权
mysql> grant replication client, process on *.* to prometheus@"localhost" identified by "123456";
mysql> grant select on performance_schema.* to prometheus@"localhost";
cd /usr/local/mysqld_exporter/
vim .my.cnf
[client]
user=prometheus
password=123456
nohup ./mysqld_exporter --config.my-cnf=.my.cnf & #启动
2.4、对nginx进行监控(没做)
cd /usr/local
git clone git://github.com/vozlt/nginx-module-vts.git #在nginx主机上操作
./configure --prefix=/usr/local/nginx-1.12.2 --user=nginx --group=nginx --with-http_stub_status_module --with-http_ssl_module --add-module=/usr/local/nginx-module-vts
make
nginx -s stop
cp ./objs/nginx /usr/local/nginx/sbin/
vim nginx.conf
http {
.....
###Prometheus配置##
vhost_traffic_status_zone;
vhost_traffic_status_filter_by_host on; #打开vhost过滤
###Prometheus配置##
.....
server {
location /status {
#vhost_traffic_status off;
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
}
}
}
########################################################################################################################
wget -c https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.9.1/nginx-vts-exporter-0.9.1.linux-amd64.tar.gz
tar xf nginx-vts-exporter-0.9.1.linux-amd64.tar.gz
./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://10.10.16.107/status/format/json & #启动nginx Vhost Traffic
http://10.10.16.107/status #访问nginx主机各节点状态
3、Alertmanager报警实现(安装在服务端)
3.1 下载alertmanager安装包
cd /usr/local
wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0/alertmanager-0.16.1.linux-amd64.tar.gz
tar -axvf alertmanager-0.16.1.linux-amd64.tar.gz
3.2 配置alert默认启动yml文件
mkdir -p /usr/local/alertmanager-0.16.1.linux-amd64/template/
cd /usr/local/alertmanager-0.16.1.linux-amd64/
cat>> /usr/local/alertmanager-0.16.1.linux-amd64/simple.yml<<EOF
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '15613691030@163.com'
smtp_auth_username: '15613691030'
smtp_auth_password: 'Shaochuan@5tgb'
smtp_require_tls: false
templates:
- '/usr/local/alertmanager-0.16.1.linux-amd64/template/*.html'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: '15613691030@163.com'
html: '{{ template "alert.html" . }}'
headers: { Subject: "[WARN] 报警邮件test" }
EOF
3.3 配置报警发送文件样式模板
cat>> /usr/local/alertmanager-0.16.1.linux-amd64/template/alert.html<<EOF #template需要创建
{{ define "alert.html" }}
<table>
<tr><td>报警名</td><td>开始时间</td></tr>
{{ range 10 := .Alerts }}
<tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr>
{{ end }}
</table>
{{ end }}
EOF
3.4 配置alert.html
cat>> /usr/local/alertmanager-0.16.1.linux-amd64/alert.html<<EOF
{{ define "alert.html" }}
<table>
<tr><td>报警名</td><td>开始时间</td></tr>
{{ range 10 := .Alerts }}
<tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr>
{{ end }}
</table>
{{ end }}
EOF
3.5 启动alertmanager服务
./alertmanager --config.file=simple.yml #启动alertmanager
4、Grafana安装、启动(安装在服务端)
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.3-1.x86_64.rpm
yum install -y urw-fonts
rpm -i grafana-5.2.3-1.x86_64.rpm
/sbin/chkconfig --add grafana-server
systemctl start grafana-server.service
浏览器访问:
http://192.168.0.204:3000(默认账号密码admin/admin)
进去后会要求修改密码,然后点击add datasource,选中 Prometheus 2.0 Stats后,就可以呈现出监控面板
6、Prometheus监控总结
6.1 做好ntp时间同步
prometheus对系统时间的准确性要求很高,必须保证本机时间与监控主机实时同步:
参照:
https://blog.csdn.net/csolo/article/details/82460539
http://www.cnblogs.com/qianjingchen/articles/9578341.html