一、alertmangaer配置参数说明:
global配置说明:
global: #全局配置 resolve_timeout: 1m #设置解析超时时间 group_by: ['alertname'] #alertmanager中的分组,选哪个标签作为分组的依据 group_wait: 10s #分组等待时间,拿到第一条告警后等待10s,如果在这个组有其他的告警一起发送出去 group_interval: 10s #各个分组发送告警的间隔时间 repeat_interval: 1h #重复告警时间,默认1小时,1小时未解决,继续报警 receiver: 'default-receiver' #默认的接收者,如果报警没有匹配到接收器,则发到这个默认的接收器上
告警路由route和标签match_re说明
在Alertmanager的配置中会定义一个基于标签匹配规则的告警路由树,以确定在接收到告警后Alertmanager需要如何对其进行处理,其中route中主要定义了告警的路由匹配规则,以及Alertmanager需要将匹配到的告警发送给哪一个receiver,如在Alertmanager配置文件中,我们只定义了一个路由,那就意味着所有由Prometheus产生的告警在发送到Alertmanager之后都会通过名为default-receiver的receiver接收,这里的default-receiver定义为一个邮箱,
在实际生产环境下,对于不同级别的告警,我们可能会不完全不同的处理方式,因此在route中,我们还可以定义更多的子Route,这些Route通过标签匹配告警的处理方式
更多链接参考:https://yunlzheng.gitbook.io
kind: ConfigMap apiVersion: v1 metadata: name: alertmanager namespace: monitor-sa data: alertmanager.yml: |- global: resolve_timeout: 1m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxx@163.com' smtp_auth_username: 'xxx' smtp_auth_password: '1989317li' smtp_require_tls: false route: group_by: [alertname] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: 'default-receiver' routes: #子路由 - receiver: cluster1 group_wait: 10s match_re: #正则匹配 severity: critical #critical等级的告警发送到cluster1的接收方 receivers: - name: 'default-receiver' email_configs: - to: '1980570647@qq.com' send_resolved: true - name: 'cluster1' webhook_configs: - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send' send_resolved: true
告警抑制inhabit(既有warnning,又有critical时候,只把critical告警信息发出来,这就是告警抑制)
inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname']
二、alertmanager配置告警说明
alertmanager配置邮件告警:
1)configmap配置
kind: ConfigMap apiVersion: v1 metadata: name: alertmanager namespace: monitor-sa data: alertmanager.yml: |- global: resolve_timeout: 1m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxx@163.com' smtp_auth_username: 'xx' smtp_auth_password: 'GRJGVYPOPMMWXJNX' smtp_require_tls: false route: group_by: [alertname] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: default-receiver receivers: - name: 'default-receiver' email_configs: - to: 'wushaoyu95@163.com' send_resolved: true
alertmanager配置钉钉告警:
1)创建钉钉机器人
打开电脑版钉钉,创建一个群,创建自定义机器人,按如下步骤创建 https://ding-doc.dingtalk.com/doc#/serverapi2/qf2nxq 我创建的机器人如下: 群设置-->智能群助手-->添加机器人-->自定义-->添加 机器人名称:kube-event 接收群组:钉钉报警测试 安全设置: 自定义关键词:cluster1 上面配置好之后点击完成即可,这样就会创建一个kube-event的报警机器人,创建机器人成功之后怎么查看webhook,按如下: 点击智能群助手,可以看到刚才创建的kube-event这个机器人,点击kube-event,就会进入到kube-event机器人的设置界面 出现如下内容: 机器人名称:kube-event 接受群组:钉钉报警测试 消息推送:开启 webhook:https://oapi.dingtalk.com/robot/send?access_token=9c03ff1f47b1d15a10d852398cafb84f8e81ceeb1ba557eddd8a79e5a5e5548e 安全设置: 自定义关键词:cluster1
2)安装钉钉的webhook插件
tar zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz cd prometheus-webhook-dingtalk-0.3.0.linux-amd64 nohup ./prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="cluster1=https://oapi.dingtalk.com/robot/send?access_token=9c03ff1f47b1d15a10d852398cafb84f8e8eeb1ba557eddd8a79e5a5e5548e" &
3)configmap配置
kind: ConfigMap apiVersion: v1 metadata: name: alertmanager namespace: monitor-sa data: alertmanager.yml: |- global: resolve_timeout: 1m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxx@163.com' smtp_auth_username: 'xxx' smtp_auth_password: '1989317li' smtp_require_tls: false route: group_by: [alertname] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: cluster1 receivers: - name: cluster1 webhook_configs: - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send' send_resolved: true
alertmanager配置邮件和钉钉同时告警:
kind: ConfigMap apiVersion: v1 metadata: name: alertmanager namespace: monitor-sa data: alertmanager.yml: |- global: resolve_timeout: 1m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxx@163.com' smtp_auth_username: 'xxx' smtp_auth_password: '1989317li' smtp_require_tls: false route: group_by: [alertname] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: 'default-receiver' routes: #子路由 - receiver: cluster1 group_wait: 10s match_re: #正则匹配 severity: critical #critical等级的告警发送到cluster1的接收方 receivers: - name: 'default-receiver' email_configs: - to: '1980570647@qq.com' send_resolved: true - name: 'cluster1' webhook_configs: - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send' send_resolved: true inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname']
proemtheus书籍:https://yunlzheng.gitbook.io