1. 下载安装包
https://prometheus.io/download/
2. 上传解压
-rwxr-xr-x. 1 3434 3434 26971621 Dec 11 22:13 alertmanager -rw-r--r--. 1 3434 3434 380 Dec 11 22:51 alertmanager.yml -rwxr-xr-x. 1 3434 3434 22458246 Dec 11 22:14 amtool -rw-r--r--. 1 3434 3434 11357 Dec 11 22:51 LICENSE -rw-r--r--. 1 3434 3434 457 Dec 11 22:51 NOTICE
3. 修改配置文件 alertmanager.yml
vim alertmanager.yml
global: resolve_timeout: 5m smtp_smarthost: 'smtp.163.com:25' smtp_from: '13551031535@163.com' smtp_auth_password: 'xxx' smtp_require_tls: false smtp_auth_username: '13551081535@163.com' route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 2m # 两条相同告警的时间间隔 receiver: 'email' # 接收者 receivers: - name: 'email' # 要与route中的receiver值一致 email_configs: # 官网上提供了此配置项 - to: 'zhengqinfeng09@163.com' # 邮件接收者 #inhibit_rules: # - source_match: # severity: 'critical' # target_match: # severity: 'warning' # equal: ['alertname', 'dev', 'instance']
4. 启动alertmanager服务
./alertmanager --config.file=alertmanager.yml
5. 修改prometheus.yml,配置与alertmanager之间的通信
# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 # 配置与alertmanager之间通信 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "rules/node_rules.yml" # 配置告警规则
6. 配置告警规则
vim rules/node_rule.yml
groups: - name: 通用实例监控 rules: - alert: 实例DOWN expr: up == 0 for: 1m # 如果1m之内,实例都是up==0状态,才会告警 labels: severity: error annotations: description: '{{ $labels.instance }} of job {{ $labels.job }} 挂掉超过1分钟.' summary: '实例:{{ $labels.instance }}已死,请处理...'
7. 使prometheus配置生效
kill -hup pid
8. 验证