• Grafana+Prometheus实现Ceph监控和钉钉告警-转载(云栖社区)


    获取软件包

    最新的软件包获取地址

    https://prometheus.io/download/

    Prometheus

    1、下载Prometheus

    $ wget https://github.com/prometheus/prometheus/releases/download/v2.6.0/prometheus-2.6.0.linux-amd64.tar.gz

    2、解压软件包

    $ tar xf prometheus-2.6.0.linux-amd64.tar.gz

    3、配置Prometheus启动程序

    把解压出来的文件移动到/usr/local/目录下,并重命名为prometheu

    $ mv prometheus-2.6.0.linux-amd64 /usr/local/prometheus

    生成启动脚本

    $ vim /usr/lib/systemd/system/prometheus.service
    [Unit]
    Description=Prometheus: the monitoring system
    Documentation=http://prometheus.io/docs/
    
    [Service]
    ExecStart=/usr/local/prometheus/prometheus 
            --config.file=/usr/local/prometheus/prometheus.yml 
            --storage.tsdb.path=/var/lib/prometheus 
            --web.console.templates=/usr/local/prometheus/consoles 
            --web.console.libraries=/usr/local/prometheus/console_libraries 
            --web.listen-address=0.0.0.0:9090 --web.external-url=
    Restart=always
    StartLimitInterval=0
    RestartSec=10
    
    [Install]
    WantedBy=multi-user.target 

    创建监控数据存储目录

    $ mkdir /var/lib/prometheus

    4、启动Prometheus

    $ systemctl daemon-reload
    $ systemctl enable prometheus
    $ systemctl start prometheus

    5、查看端口监听状态

    Prometheus监听的端口为9090,启动成功后可以通过netstat命令进行查看端口的监听状态

    $ netstat -antpu | grep 9090
    tcp        0      0 127.0.0.1:33270         127.0.0.1:9090          ESTABLISHED 6426/prometheus    
    tcp6       0      0 :::9090                 :::*                    LISTEN      6426/prometheus    
    tcp6       0      0 ::1:9090                ::1:51821               ESTABLISHED 6426/prometheus    
    tcp6       0      0 ::1:51821               ::1:9090                ESTABLISHED 6426/prometheus    
    tcp6       0      0 127.0.0.1:9090          127.0.0.1:33270         ESTABLISHED 6426/prometheus

    6、通过浏览器进行访问

    Prometheus启动成功后,可以通过浏览器访问查看状态和配置信息

    Ceph_export

    Ceph_export 需要使用Go进行编译,也可以下载已经编译好的Ceph_exporter直接使用

    链接:https://pan.baidu.com/s/1AEF_pdDvSJ5gMPapaBuBrA

    提取码:jkuh

    1、安装软件Go环境

    $ yum -y install golang

    2、查看Go环境变量

    $ go env
    GOARCH="amd64"
    GOBIN=""
    GOCACHE="/root/.cache/go-build"
    GOEXE=""
    GOFLAGS=""
    GOHOSTARCH="amd64"
    GOHOSTOS="linux"
    GOOS="linux"
    GOPATH="/root/go"
    GOPROXY=""
    GORACE=""
    GOROOT="/usr/lib/golang"
    GOTMPDIR=""
    GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
    GCCGO="gccgo"
    CC="gcc"
    CXX="g++"
    CGO_ENABLED="1"
    GOMOD=""
    CGO_CFLAGS="-g -O2"
    CGO_CPPFLAGS=""
    CGO_CXXFLAGS="-g -O2"
    CGO_FFLAGS="-g -O2"
    CGO_LDFLAGS="-g -O2"
    PKG_CONFIG="pkg-config"
    GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build359765015=/tmp/go-build -gno-record-gcc-switches"

    3、设置Go环境变量

    $ vim /etc/profile.d/go.sh
    export GOROOT=/usr/lib/golang
    export GOBIN=$GOROOT/bin
    export GOPATH=/root/go
    export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
    
    $ source /etc/profile.d/go.sh

    4、下载并编译Ceph_exporter

    $ mkdir go/src/github.com/digitalocean/
    $ cd go/src/github.com/digitalocean/
    $ git clone https://github.com/digitalocean/ceph_exporter
    $ cd ceph_exporter
    $ go build

    5、创建Ceph_exporter启动程序

    $ mkdir ~/go/bin/
    $ cp ~/go/src/github.com/digitalocean/ceph_exporter/ceph_exporter ~/go/bin/
    $ vim /usr/lib/systemd/system/ceph_exporter.service
    [Unit]
    Description=Prometheus's ceph metrics exporter
     
    [Service]
    User=root
    Group=root
    ExecStart=/root/go/bin/ceph_exporter
     
    [Install]
    WantedBy=multi-user.target
    Alias=ceph_exporter.service

    6、启动Ceph_exporter

    $ systemctl daemon-reload
    $ systemctl enable ceph_exporter
    $ systemctl start ceph_exporter

    7、查看端口监听状态

    Ceph_exporter使用的是9128端口,可以通过netstat进行查看端口的监听状态

    $ netstat -antpu | grep 9128
    tcp6       0      0 :::9128                 :::*                    LISTEN      6839/ceph_exporter

    8、修改Prometheus配置

    Ceph_exporter的接口添加到Prometheus的配置中

    $ vim /usr/local/prometheus/prometheus.yml
    scrape_configs:
      - job_name: 'ceph'
        honor_labels: true
        static_configs:
        - targets: ['192.168.1.10:9128']
          labels:
            instance: Ceph测试集群

    9、重启Prometheus进程

    $ systemctl restart prometheus

    10、浏览器访问验证

    Grafana

    1、下载软件包

    不同系统的最新软件包可以在Grafana的官网获取下载地址https://grafana.com/grafana/download

    $ wget https://dl.grafana.com/oss/release/grafana-5.4.3-1.x86_64.rpm

    2、安装Grafana

    $ yum -y install grafana-5.4.3-1.x86_64.rpm

    3、启动Grafana

    $ systemctl enable grafana-server
    $ systemctl start grafana-server

    4、查看端口监听状态

    Grafana监听端口为3000,可以使用netstat查看监听状态

    $ netstat -antpu | grep 3000
    tcp6       0      0 :::3000                 :::*                    LISTEN      7147/grafana-server

    5、浏览器访问登录

    访问地址为http://$IP:3000,初始用户名和密码均为admin,首次登录后会提示设置新的密码

    6、配置Dashboard

    点击Add data source添加数据源

    选择Prometheus

    URL地址为Prometheus的访问地址http://$IP:9090

    导入Dashboard,模板的编号为917,如果无法连接互联网,也可以在Grafana的官网下载模板后手动导入https://grafana.com/dashboards/917

    查看监控状态

    AlertManager

    1、安装Alertmanager

    $ wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0/alertmanager-0.16.0.linux-amd64.tar.gz
    $ tar xf alertmanager-0.16.0-alpha.0.linux-amd64.tar.gz
    $ cd alertmanager-0.16.0-alpha.0.linux-amd64
    $ cp alertmanager amtool /usr/bin/
    $ cp alertmanager.yml /usr/local/prometheus/

    2、生成启动程序

    $ vim /usr/lib/systemd/system/alertmanager.service
    [Unit]
    Description=Prometheus: the alerting system
    Documentation=http://prometheus.io/docs/
    After=prometheus.service
    
    [Service]
    ExecStart=/usr/bin/alertmanager --config.file=/usr/local/prometheus/alertmanager.yml
    Restart=always
    StartLimitInterval=0
    RestartSec=10
    
    [Install]
    WantedBy=multi-user.target

    3、启动Alertmanager

    $ systemctl enable alertmanager
    $ systemctl start alertmanager

    4、查看端口监听状态

    Alertmanager的监听端口为9093,可以使用netstat查看端口监听状态

    $ netstat -antpu | grep 9093
    tcp6       0      0 :::9093                 :::*                    LISTEN      7381/alertmanager 

    5、配置Prometheus,添加Alertmanager端点

    $ vim /usr/local/prometheus/prometheus.yml
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["192.168.1.10:9093"]

    6、重启Prometheus

    $ systemctl restart prometheus

    配置钉钉告警

    1、配置webhook

    $ mkdir -p /usr/lib/golang/src/github.com/timonwong/
    $ cd /usr/lib/golang/src/github.com/timonwong/
    $ git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
    $ cd prometheus-webhook-dingtalk
    $ make
    $ nohup ./prometheus-webhook-dingtalk --ding.profile="webhook=https://oapi.dingtalk.com/robot/send?access_token=8fe12c1a58b0769d7fcbf6ebf3bcd2cfcba825f2c45b4b39055890fd705df543" &> /var/log/dingding.log &

    2、添加webhook告警

    $ vim /usr/local/prometheus/alertmanager.yml
    global:
      resolve_timeout: 5m
     
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
    
    receivers:
    - name: 'web.hook'
      webhook_configs:
      - url: 'http://192.168.1.10:8060/dingtalk/webhook/send'
    
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

    3、添加告警规则文件

    $ vim /usr/local/prometheus/prometheus.yml
    rule_files:
      - /usr/local/prometheus/ceph.yml

    4、配置告警规则

    $ vim /usr/local/prometheus/ceph.yml
    groups:
    - name: ceph-rule
      rules:
      - alert: Ceph OSD Down
        expr: ceph_osd_down > 0
        for: 2m
        labels:
          product: Ceph测试集群
        annotations:
          Warn: "{{$labels.instance}}: 有{{ $value }}个OSD挂掉了"
          Description: "{{$labels.instance}}:{{ $labels.osd }}当前状态为{{ $labels.status }}"
    
      - alert: 集群空间使用率
        expr: ceph_cluster_used_bytes / ceph_cluster_capacity_bytes * 100 > 80
        for: 2m
        labels:
          product: Ceph测试集群
        annotations:
          Warn: "{{$labels.instance}}:集群空间不足"
          Description: "{{$labels.instance}}:当前空间使用率为{{ $value }}"

    5、重启进程使配置生效

    $ systemctl restart alertmanager
    $ systemctl restart prometheus.service

    6、钉钉验证

    停掉一个OSD后,钉钉收到如下告警

    b932c49513cb45222821ce4743d7ce9106752eb7

    重新启动后收到恢复通知

  • 相关阅读:
    java判断字符串是否为数字
    门萨高智商者的集中营
    Android全局变量是用public&nbsp…
    oracle 关闭查询的进程
    oracle 常用参考
    oracle创建临时表
    透明网关设置
    透明网关diy
    又一个下拉菜单导航按钮
    数据库备份或导出时丢失主键的相关知识
  • 原文地址:https://www.cnblogs.com/xiaoyaojinzhazhadehangcheng/p/11025265.html
Copyright © 2020-2023  润新知