• K8s系列-Prometheus使用邮件告警


    感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警

  • 相关阅读:
    CSS3 渐变 透明 圆角
    使用JSON作为函数的参数(转载)
    如何让输入的单词首字母大写
    mysql 修改表/字段 增加/删除表索引
    Jquery Mobile 客户端验证
    如何写出漂亮的js代码(转载)
    GoogleMap添加一个Marker
    Log4j的使用【转载】
    Google Map 自定义infowindow
    MYSQL重装出现could not start the service mysql error:0处理(已验证可以使用)
  • 原文地址:https://www.cnblogs.com/lihanlin/p/12657690.html
Copyright © 2020-2023  润新知