• prometheus+grafana+alertmanager监控系统搭建 若


    1. 说明

    本文描述如何搭建prometheus+grafana+alertmanager监控系统,目标是检测到cpu使用率过高之后能发送短信通知相关人员

    1. prometueus搭建

    开源的系统监控和报警框架,灵感源自Google的Borgmon监控系统

    1.1. docker-compose配置

    记得把./prometheus/data的权限设置为777

        prometheus:
            image: prom/prometheus:v2.36.2
            container_name: prometheus
            restart: always
            network_mode: default_network
            logging:
                driver: "json-file"
                options:
                    max-size: "200m"
                    max-file: "10"
            volumes:
                - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
                - ./prometheus/rules:/etc/prometheus/rules
                - ./prometheus/data:/prometheus
            ports:
                - "0.0.0.0:9090:9090"
    

    1.2. 配置

    需要配置检测rules规则文件,node-exporter和alertmanager的地址

    # my global config
    global:
      scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
              - 10.0.22.120:9093
              # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      - 'rules/*_rules.yml'
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: "prometheus"
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
          - targets: ["10.0.22.120:9100"]
    

    1.2. rules--cpu检测配置

    这里为了方便测试,直接超过1%使用率就报警

    groups:
      - name: cpu-monitor
        rules:
        - alert: HostHighCpuLoad
          expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: Host high CPU load (instance {{ $labels.instance }})
            # description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
            description: "服务器5分钟内CPU使用率超过1%!(当前值: {{ $value }}%)"

    2. grafana搭建

    监控可视化

    2.1. docker-compose配置

    记得把./grafana/data的权限设置为777

    grafana:
    image: grafana/grafana:6.5.0
    container_name: grafana
    restart: always
    network_mode: default_network
    logging:
    driver: "json-file"
    options:
    max-size: "200m"
    max-file: "10"
    volumes:
    - ./grafana/data:/var/lib/grafana
    ports:
    - "0.0.0.0:3000:3000"

    2.2. 配置

    使用默认配置

    3. node-exporter搭建

    用来监控各节点的资源信息的exporter,应部署到prometheus监控的所有节点

    3.1. docker-compose配置

    node-exporter:
    image: prom/node-exporter:v1.3.1
    container_name: node-exporter
    restart: always
    network_mode: default_network
    logging:
    driver: "json-file"
    options:
    max-size: "200m"
    max-file: "10"
    ports:
    - "0.0.0.0:9100:9100"

    3.2. 配置

    使用默认配置

     

    4. alertmanager搭建

    处理由客户端应用程序(如Prometheus server)发送的警报。它负责将重复数据删除,分组和路由到正确的接收者集成,还负责沉默和抑制警报

    4.1. docker-compose配置

    记得把./prometheus/data的权限设置为777

    alertmanager:
    image: prom/alertmanager:v0.24.0
    container_name: alertmanager
    restart: always
    network_mode: default_network
    logging:
    driver: "json-file"
    options:
    max-size: "200m"
    max-file: "10"
    volumes:
    - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    - ./alertmanager/data:/alertmanager
    ports:
    - "0.0.0.0:9093:9093"

    4.2. 配置

    配置相关邮件地址

    global:
    # The smarthost and SMTP sender used for mail notifications.
    smtp_smarthost: 'smtp.xxxxxxx:25'
    smtp_from: 'xxxxx@163.com'
    smtp_auth_username: 'xxxx@163.com'
    smtp_auth_password: 'xxxxx'
    smtp_require_tls: true
    route:
    group_by: ['alertname']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 1h
    receiver: 'email'
    receivers:
    - name: 'email'
    email_configs:
    - to: 'xxxxxx@163.com'
    inhibit_rules:
    - source_match:
    severity: 'critical'
    target_match:
    severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

    5. docker-compose部署

    sudo docker-compose -f docker-compose.yml up -d

    查看container:

    291a37d8cbb8   prom/alertmanager:v0.24.0         "/bin/alertmanager -…"   19 minutes ago   Up 19 minutes          0.0.0.0:9093->9093/tcp                                                                                                                                                                                                                                                alertmanager
    220657de4da1   prom/node-exporter:v1.3.1         "/bin/node_exporter"     19 minutes ago   Up 19 minutes          0.0.0.0:9100->9100/tcp                                                                                                                                                                                                                                                node-exporter
    7dba7db4b98a   prom/prometheus:v2.36.2           "/bin/prometheus --c…"   19 minutes ago   Up 19 minutes          0.0.0.0:9090->9090/tcp                                                                                                                                                                                                                                                prometheus
    d3f91f7e8453   grafana/grafana:6.5.0             "/run.sh"                19 minutes ago   Up 19 minutes          0.0.0.0:3000->3000/tcp                                                                                                                                                                                                                                                grafana

    6. 验证

    这样就完成了,测试一下,邮件是否有收到

  • 相关阅读:
    成熟失去快乐
    明天出发
    不知其期
    不是自己的机子就是不好
    都收拾完了
    文盲
    郁闷
    好像感冒了
    又都疯了
    流水账,从我开始接触计算机时写起
  • 原文地址:https://www.cnblogs.com/zhanchenjin/p/16435522.html
Copyright © 2020-2023  润新知