• Alertmanager邮件告警


    Alertmanager安装配置

    wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
    tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local
    cd /usr/local
    mv alertmanager-0.21.0.linux-amd64/ alertmanager

    创建启动文件

    vim /usr/lib/systemd/system/alertmanager.service 
    
    [Unit]
    Description=alertmanager
    Documentation=https://github.com/prometheus/alertmanager
    After=network.target
    
    [Service]
    Type=simple
    User=prometheus
    ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert-test.yml --storage.path=/usr/local/alertmanager/data
    Restart=on-failure
    
    [Install]
    WantedBy=multi-user.target

    Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可。

    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: 'aa@qq.com'
      smtp_auth_username: 'aa@qq.com'
      smtp_auth_password: 'aa'
      smtp_require_tls: false
    
    templates:
      - '/usr/local/alertmanager/template/*.tmpl' 邮件告警模板
    
    # route标记:告警如何发送分配
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1m
      receiver: 'mail'
    
    receivers:
    - name: 'mail'
      email_configs:
        - to: 'dd5@qq.com'
    send_resolved: true #告警恢复 html: '{{ template "default-monitor.html" }}' #应用的哪个模板 headers: {Subject: "[WARN] 报警邮件 test"} #邮件主题信息 如果不写headers也可以再模板中自定义默认加载email.default.subject这个模板
    • smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址+端口;
    • smtp_auth_password:是发送邮箱的授权码而不是登录密码;
    • smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
    • templates:指出邮件的模板路径;
    • receivers 下 html 指出邮件内容模板名,这里模板名为 “alert.html”,在模板路径中的某个文件中定义。
    • headers:为邮件标题;

    配置告警规则

    配置 rule.yml

    groups:
    - name: node_alerts
      rules:
      - alert: node-up告警
        expr: up==0
        for: 10s
        labels:
          serverity: page
        annotations:
          summary: "{{ $labels.instance }} 已停止运行超过10s"

    配置prometheus.yml指定rule.yml的路径

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
           - localhost:9093    #添加alertmanager# 新增
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
       #- "/usr/local/prometheus/rules/*_alerts.yml"
       - "rules/*_alerts.yml"   # 新增
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
    
        - targets: ['xxxxxxxxxxx:9090']
    - job_name: 'xxxxxxxxxx'
        static_configs:
        - targets: ['xxxxxxxxxxxxx:9100']
          labels:
            instance: test

    重启 Prometheus 服务:

    chown -R prometheus.prometheus /usr/local/prometheus/rule.yml
    systemctl restart prometheus

    编写邮件模板

    注意:文件后缀为 tmpl

    告警模版

    vi /usr/local/alertmanager/template/mail.tmpl
    {{ define "default-monitor.html" }} {{ range .Alerts }} <pre> =============start=========== 告警程序: prometheus_alert 告警级别: {{ .Labels.severity }}
    告警类型: {{ .Labels.alertname }}
    故障主机: {{ .Labels.instance }}
    告警主题: {{ .Annotations.summary }}
    告警详情: {{ .Annotations.description }}
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:23" }}
    ==============end============
    </pre>
    {{ end }}
    {{ end }}

    告警回复模版

    
    
    vi /usr/local/alertmanager/template/mail.tmpl
    {{ define "default-monitor.html" }}
    {{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}
    @警报
    <pre>
    类型: {{ .Labels.alertname }}
    实例: {{ .Labels.instance }}
    信息: {{ .Annotations.summary }}
    详情: {{ .Annotations.description }}
    时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
    </pre>
    {{ end }}{{ end -}}
    {{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}
    @恢复
    <pre>
    类型: {{ .Labels.alertname }}
    实例: {{ .Labels.instance }}
    信息: {{ .Annotations.summary }}
    时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
    恢复: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
    </pre>
    {{ end }}{{ end -}}
    {{- end }}
     
  • 相关阅读:
    iOS React Native实践系列二
    iOS React Native实践系列一
    ios各种兼容记录
    ios的__weak、__strong关键字
    index使用基本原则
    mysql explain详解
    手写迷你Tomcat
    动态代理
    C#设计模式(23种模式)
    unity 序列化和反序列化
  • 原文地址:https://www.cnblogs.com/fat-girl-spring/p/13554989.html
Copyright © 2020-2023  润新知