• K8s之Prometheus监控


    目录

    容器监控与报警

    容器监控的实现方对比虚拟机或者物理机来说比大的区别,比如容器在k8s环境中可以任意横向扩容与缩容,那么就需要监控服务能够自动对新创建的容器进行监控,当容器删除后又能够及时的从监控服务中删除,而传统的zabbix的监控方式需要在每一个容器中安装启动agent,并且在容器自动发现注册及模板关联方面并没有比较好的实现方式。

    role host port
    Prometheus master2(10.203.104.21) 9090
    node exporter master/node 9100
    Grafana master3(10.203.104.22) 3000
    cadvisor node 8080
    alertmanager master3 9093
    haproxy_exporter HA1(10.203.104.30) 9101

    Prometheus

    k8s的早期版本基于组件heapster实现对pod和node节点的监控功能,但是从k8s 1.8版本开始使用metrics API的方式监控,并在1.11版本 正式将heapster替换,后期的k8s监控主要是通过metrics Server提供核心监控指标,比如Node节点的CPU和内存使用率,其他的监控交由另外一个组件Prometheus 完成

    prometheus简介

    https://prometheus.io/docs/ #官方文档

    https://github.com/prometheus #github地址

    Prometheus是基于go语言开发的一套开源的监控、报警和时间序列数据库的组合,是由SoundCloud公司开发的开源监控系统,Prometheus是CNCF(Cloud Native Computing Foundation,云原生计算基金会)继kubernetes 之后毕业的第二个项目,prometheus在容器和微服务领域中得到了广泛的应用,其特点主要如下

    使用key-value的多维度格式保存数据
    数据不使用MySQL这样的传统数据库,而是使用时序数据库,目前是使用的TSDB
    支持第三方dashboard实现更高的图形界面,如grafana(Grafana 2.5.0版本及以上)
    功能组件化
    不需要依赖存储,数据可以本地保存也可以远程保存
    服务自动化发现
    强大的数据查询语句功(PromQL,Prometheus Query Language)
    

    prometheus系统架构

    prometheus server:主服务,接受外部http请求,收集、存储与查询数据等
    prometheus targets: 静态收集的目标服务数据
    service discovery:动态发现服务
    prometheus alerting:报警通知
    pushgateway:数据收集代理服务器(类似于zabbix proxy)
    data visualization and export: 数据可视化与数据导出(访问客户端)
    

    prometheus 安装方式

    https://prometheus.io/download/ #官方二进制下载及安装,prometheus server的监听端口为9090
    https://prometheus.io/docs/prometheus/latest/installation/ #docker镜像直接启动
    https://github.com/coreos/kube-prometheus #operator部署
    
    容器方式安装prometheus

    本次环境在Master2(10.203.104.21)中安装prometheus

    运行prometheus容器

    root@master2:~# docker run 
        -p 9090:9090 
        prom/prometheus
    

    在浏览器中访问master2节点的9090端口测试prometheus

    operator部署

    https://github.com/coreos/kube-prometheus

    克隆项目
    root@master1:/usr/local/src# git clone https://github.com/coreos/kube-prometheus.git
    root@master1:/usr/local/src# cd kube-prometheus-release-0.4/
    root@master1:/usr/local/src/kube-prometheus-release-0.4# ls
    build.sh            DCO   example.jsonnet  experimental  go.sum  jsonnet           jsonnetfile.lock.json  LICENSE   manifests  OWNERS     scripts                            tests
    code-of-conduct.md  docs  examples         go.mod        hack    jsonnetfile.json  kustomization.yaml     Makefile  NOTICE     README.md  sync-to-internal-registry.jsonnet  test.sh
    
    root@master1:/usr/local/src/kube-prometheus-release-0.4# cd manifests/
    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# ls
    

    创建账号规则
    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f setup/
    namespace/monitoring created
    customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
    customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
    customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
    customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
    customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
    clusterrole.rbac.authorization.k8s.io/prometheus-operator created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
    deployment.apps/prometheus-operator created
    service/prometheus-operator created
    serviceaccount/prometheus-operator created
    
    创建prometheus
    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f .
    alertmanager.monitoring.coreos.com/main created
    secret/alertmanager-main created
    service/alertmanager-main created
    serviceaccount/alertmanager-main created
    servicemonitor.monitoring.coreos.com/alertmanager created
    secret/grafana-datasources created
    configmap/grafana-dashboard-apiserver created
    configmap/grafana-dashboard-cluster-total created
    configmap/grafana-dashboard-controller-manager created
    configmap/grafana-dashboard-k8s-resources-cluster created
    configmap/grafana-dashboard-k8s-resources-namespace created
    configmap/grafana-dashboard-k8s-resources-node created
    configmap/grafana-dashboard-k8s-resources-pod created
    configmap/grafana-dashboard-k8s-resources-workload created
    configmap/grafana-dashboard-k8s-resources-workloads-namespace created
    configmap/grafana-dashboard-kubelet created
    configmap/grafana-dashboard-namespace-by-pod created
    configmap/grafana-dashboard-namespace-by-workload created
    configmap/grafana-dashboard-node-cluster-rsrc-use created
    configmap/grafana-dashboard-node-rsrc-use created
    configmap/grafana-dashboard-nodes created
    configmap/grafana-dashboard-persistentvolumesusage created
    configmap/grafana-dashboard-pod-total created
    configmap/grafana-dashboard-pods created
    configmap/grafana-dashboard-prometheus-remote-write created
    configmap/grafana-dashboard-prometheus created
    configmap/grafana-dashboard-proxy created
    configmap/grafana-dashboard-scheduler created
    configmap/grafana-dashboard-statefulset created
    configmap/grafana-dashboard-workload-total created
    configmap/grafana-dashboards created
    deployment.apps/grafana created
    service/grafana created
    serviceaccount/grafana created
    servicemonitor.monitoring.coreos.com/grafana created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    deployment.apps/kube-state-metrics created
    role.rbac.authorization.k8s.io/kube-state-metrics created
    rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    service/kube-state-metrics created
    serviceaccount/kube-state-metrics created
    servicemonitor.monitoring.coreos.com/kube-state-metrics created
    clusterrole.rbac.authorization.k8s.io/node-exporter created
    clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
    daemonset.apps/node-exporter created
    service/node-exporter created
    serviceaccount/node-exporter created
    servicemonitor.monitoring.coreos.com/node-exporter created
    apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
    clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
    clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
    clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
    clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
    configmap/adapter-config created
    deployment.apps/prometheus-adapter created
    rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
    service/prometheus-adapter created
    serviceaccount/prometheus-adapter created
    clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    servicemonitor.monitoring.coreos.com/prometheus-operator created
    prometheus.monitoring.coreos.com/k8s created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s-config created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
    service/prometheus-k8s created
    serviceaccount/prometheus-k8s created
    servicemonitor.monitoring.coreos.com/prometheus created
    servicemonitor.monitoring.coreos.com/kube-apiserver created
    servicemonitor.monitoring.coreos.com/coredns created
    servicemonitor.monitoring.coreos.com/kube-controller-manager created
    servicemonitor.monitoring.coreos.com/kube-scheduler created
    servicemonitor.monitoring.coreos.com/kubelet created
    
    设置端口转发
    $ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/grafana 3000:3000
    $ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090:9090
    

    web访问master1节点3000端口测试(http://10.203.104.20:3000)

    基于NodePort暴露服务

    grafana

    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat grafana-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: grafana
      name: grafana
      namespace: monitoring
    spec:
      ports:
      - name: http
        port: 3000
        targetPort: 3000
        nodePort: 33000
      selector:
        app: grafana
        
    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f grafana-service.yaml
    

    web访问master1节点3000端口测试(http://10.203.104.20:33000)

    prometheus

    root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat prometheus-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        prometheus: k8s
      name: prometheus-k8s
      namespace: monitoring
    spec:
      ports:
      - name: web
        port: 9090
        targetPort: web
      selector:
        app: prometheus
        prometheus: k8s
        nodePort: 39090
      sessionAffinity: ClientIP
    
     root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f prometheus-service.yaml
    
    二进制方式安装

    本次环境在Master2中安装prometheus

    解压二进制压缩包文件
    root@master2:/usr/local/src# ls
    prometheus-2.17.1.linux-amd64.tar.gz
    
    root@master2:/usr/local/src# tar -zxvf prometheus-2.17.1.linux-amd64.tar.gz
    prometheus-2.17.1.linux-amd64/
    prometheus-2.17.1.linux-amd64/NOTICE
    prometheus-2.17.1.linux-amd64/LICENSE
    prometheus-2.17.1.linux-amd64/prometheus.yml
    prometheus-2.17.1.linux-amd64/prometheus
    prometheus-2.17.1.linux-amd64/promtool
    prometheus-2.17.1.linux-amd64/console_libraries/
    prometheus-2.17.1.linux-amd64/console_libraries/menu.lib
    prometheus-2.17.1.linux-amd64/console_libraries/prom.lib
    prometheus-2.17.1.linux-amd64/consoles/
    prometheus-2.17.1.linux-amd64/consoles/prometheus-overview.html
    prometheus-2.17.1.linux-amd64/consoles/index.html.example
    prometheus-2.17.1.linux-amd64/consoles/node-cpu.html
    prometheus-2.17.1.linux-amd64/consoles/node-overview.html
    prometheus-2.17.1.linux-amd64/consoles/node.html
    prometheus-2.17.1.linux-amd64/consoles/node-disk.html
    prometheus-2.17.1.linux-amd64/consoles/prometheus.html
    prometheus-2.17.1.linux-amd64/tsdb
    
    prometheus目录创建软链接
    root@master2:/usr/local/src# ln -sv /usr/local/src/prometheus-2.17.1.linux-amd64 /usr/local/prometheus
    '/usr/local/prometheus' -> '/usr/local/src/prometheus-2.17.1.linux-amd64'
    root@master2:/usr/local/src# cd /usr/local/prometheus
    root@master2:/usr/local/prometheus# ls
    console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  tsdb
    
    创建prometheus启动脚本
    root@master2:/usr/local/prometheus# vim /etc/systemd/system/prometheus.service
    [Unit]
    Description=Prometheus Server
    Documentation=https://prometheus.io/docs/introduction/overview/
    After=network.target
    
    [Service]
    Restart=on-failure
    WorkingDirectory=/usr/local/prometheus/
    ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
    [Install]
    
    WantedBy=multi-user.target
    
    启动prometheus服务
    root@master2:/usr/local/prometheus# systemctl start prometheus
    root@master2:/usr/local/prometheus# systemctl status prometheus
    root@master2:/usr/local/prometheus# systemctl enable prometheus
    Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.
    
    访问prometheus web界面

    访问prometheus节点的9090端口

    node exporter

    收集各k8s node节点(master/node)上的监控指标数据,监听端口为9100

    二进制方式安装node exporter(master/node)

    解压二进制压缩包文件

    root@node1:/usr/local/src# ls
    node_exporter-0.18.1.linux-amd64.tar.gz
    
    root@node1:/usr/local/src# tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz 
    node_exporter-0.18.1.linux-amd64/
    node_exporter-0.18.1.linux-amd64/node_exporter
    node_exporter-0.18.1.linux-amd64/NOTICE
    node_exporter-0.18.1.linux-amd64/LICENSE
    

    node_exporter目录创建软链接

    root@node1:/usr/local/src# ln -sv /usr/local/src/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
    '/usr/local/node_exporter' -> '/usr/local/src/node_exporter-0.18.1.linux-amd64'
    
    root@node1:/usr/local/src# cd /usr/local/node_exporter
    root@node1:/usr/local/node_exporter# ls
    LICENSE  node_exporter  NOTICE
    
    创建node exporter启动脚本
    root@node1:/usr/local/node_exporter# vim /etc/systemd/system/node-exporter.service
    [Unit]
    Description=Prometheus Node Exporter
    After=network.target
    
    [Service]
    ExecStart=/usr/local/node_exporter/node_exporter
    
    [Install]
    WantedBy=multi-user.target
    
    启动node exporter服务
    root@node1:/usr/local/node_exporter# systemctl start node-exporter
    root@node1:/usr/local/node_exporter# systemctl status node-exporter
    root@node1:/usr/local/node_exporter# systemctl enable node-exporter
    Created symlink /etc/systemd/system/multi-user.target.wants/node-exporter.service → /etc/systemd/system/node-exporter.service.
    
    访问node exporter web界面

    在k8s的master和node节点分别测试访问9100端口

    prometheus采集node 指标数据

    配置prometheus通过node exporter采集 监控指标数据

    prometheus配置文件

    prometheus server的prometheus.yml文件

    root@master2:/usr/local/prometheus# cat prometheus.yml
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      #- job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
            # static_configs:
            #- targets: ['localhost:9090']
    
      # 指定node exporter采集的IP及端口
      - job_name: 'prometheus-node'
        static_configs:
        - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']
    
      - job_name: 'prometheus-master'
        static_configs:
        - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']
    
    重启prometheus服务
    root@master2:/usr/local/prometheus# systemctl restart prometheus
    
    prometheus验证node节点状态

    prometheus验证node节点监控数据

    Grafana

    https://grafana.com/docs/ #官方安装文档

    调用prometheus的数据,进行更专业的可视化

    安装grafana

    在master3(10.203.104.22)中安装grafana,安装版本为v6.7.2

    root@master3:/usr/local/src# apt-get install -y adduser libfontconfig1
     root@master3:/usr/local/src# wget https://dl.grafana.com/oss/release/grafana_6.7.2_amd64.deb
    root@master3:/usr/local/src# dpkg -i grafana_6.7.2_amd64.deb
    

    配置文件

    root@master3:~# vim /etc/grafana/grafana.ini
    [server]
    # Protocol (http, https, socket)
    
    protocol = http
    
    # The ip address to bind to, empty will bind to all interfaces
    
    http_addr = 0.0.0.0
    
    # The http port to use
    
    http_port = 3000
    

    启动grafana

    root@master3:~# systemctl start grafana-server.service
    root@master3:~# systemctl enable grafana-server.service
    Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
    Executing: /lib/systemd/systemd-sysv-install enable grafana-server
    Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.
    

    grafana web界面

    登录界面

    添加prometheus数据源





    import模板

    模板下载地址

    https://grafana.com/grafana/dashboards

    点击目标模板

    下载模板

    通过模板ID导入

    确认模板信息

    验证图形信息

    饼图插件未安装,需要提前安装
    https://grafana.com/grafana/plugins/grafana-piechart-panel

    在线安装:
    # grafana-cli plugins install grafana-piechart-panel
    
    离线安装:
    root@master3:/var/lib/grafana/plugins# pwd
    /var/lib/grafana/plugins
    
    root@master3:/var/lib/grafana/plugins# ls
    grafana-piechart-panel-v1.5.0-0-g3234d63.zip
    
    root@master3:/var/lib/grafana/plugins# unzip grafana-piechart-panel-v1.5.0-0-g3234d63.zip
    root@master3:/var/lib/grafana/plugins# mv grafana-piechart-panel-3234d63/ grafana-piechart-panel
    root@master3:/var/lib/grafana/plugins# systemctl restart grafana-server
    

    监控pod资源

    node节点都需安装cadvisor

    cadvisor由谷歌开源,cadvisor不仅可以搜集一台机器上所有运行的容器信息,还提供基础查询界面和http接口,方便其他组件如Prometheus进行数据抓取,cAdvisor可以对节点机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况。

    k8s 1.12之前cadvisor集成在node节点的上kubelet服务中,从1.12版本开始分离为两个组件,因此需要在node节点单独部署cadvisor。

    https://github.com/google/cadvisor

    cadvisor镜像准备

    # docker load -i cadvisor_v0.36.0.tar.gz
    # docker tag gcr.io/google-containers/cadvisor:v0.36.0 harbor.linux.com/baseimages/cadvisor:v0.36.0
    # docker push harbor.linux.com/baseimages/cadvisor:v0.36.0
    

    启动cadvisor容器

    # docker run 
        --volume=/:/rootfs:ro 
        --volume=/var/run:/var/run:rw 
        --volume=/sys:/sys:ro 
        --volume=/var/lib/docker/:/var/lib/docker:ro 
        --volume=/dev/disk/:/dev/disk:ro 
        --publish=8080:8080 
        --detach=true 
        --name=cadvisor 
        harbor.linux.com/baseimages/cadvisor:v0.36.0
    

    验证cadvisor web界面:

    访问node节点的cadvisor监听端口:http://10.203.104.26:8080/

    prometheus采集cadvisor数据

    root@master2:~# cat /usr/local/prometheus/prometheus.yml
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      #- job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
            # static_configs:
            #- targets: ['localhost:9090']
    
        - job_name: 'prometheus-node'
          static_configs:
          - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']
    
        - job_name: 'prometheus-master'
          static_configs:
          - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']
    
        - job_name: 'prometheus-pod-cadvisor'
          static_configs:
          - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']
    

    重启prometheus

    root@master2:~# systemctl restart prometheus
    

    grafana添加pod监控模板


    prometheus报警设置

    prometheus触发一条告警的过程:

    prometheus--->触发阈值--->超出持续时间--->alertmanager--->分组|抑制|静默--->媒体类型--->邮件|钉钉|微信等。

    分组(group): 将类似性质的警报合并为单个通知。
    静默(silences): 是一种简单的特定时间静音的机制,例如:服务器要升级维护可以先设置这个时间段告警静默。
    抑制(inhibition): 当警报发出后,停止重复发送由此警报引发的其他警报即合并一个故障引起的多个报警事件,可以消除冗余告警
    
    • alertmanager主机的IP为10.203.104.22,主机名为master3

    下载并报警组件alertmanager

    root@master3:/usr/local/src# ls
    alertmanager-0.20.0.linux-amd64.tar.gz  grafana_6.7.2_amd64.deb  node_exporter-0.18.1.linux-amd64.tar.gz
    
    root@master3:/usr/local/src# tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz 
    alertmanager-0.20.0.linux-amd64/
    alertmanager-0.20.0.linux-amd64/LICENSE
    alertmanager-0.20.0.linux-amd64/alertmanager
    alertmanager-0.20.0.linux-amd64/amtool
    alertmanager-0.20.0.linux-amd64/NOTICE
    alertmanager-0.20.0.linux-amd64/alertmanager.yml
    
    root@master3:/usr/local/src# ln -sv /usr/local/src/alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
    '/usr/local/alertmanager' -> '/usr/local/src/alertmanager-0.20.0.linux-amd64'
    
    root@master3:/usr/local/src# cd /usr/local/alertmanager
    root@master3:/usr/local/alertmanager# ls
    alertmanager  alertmanager.yml  amtool  LICENSE  NOTICE
    

    配置alertmanager

    https://prometheus.io/docs/alerting/configuration/ #官方配置文档

    root@master3:/usr/local/alertmanager# cat alertmanager.yml
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: '2973707860@qq.com'
      smtp_auth_username: '2973707860@qq.com'
      smtp_auth_password: 'udwthyyxtstcdhcj'
      smtp_hello: '@qq.com'
      smtp_require_tls: false
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
      #webhook_configs:
      #- url: 'http://127.0.0.1:5001/'
      email_configs:
        - to: '2973707860@qq.com'
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
    

    启动alertmanager服务

    二进制启动

    root@master3:/usr/local/alertmanager# ./alertmanager --config.file=./alertmanager.yml
    

    服务启动文件

    root@master3:/usr/local/alertmanager# cat /etc/systemd/system/alertmanager.service
    [Unit]
    Description=Prometheus Server
    Documentation=https://prometheus.io/docs/introduction/overview/
    After=network.target
    [Service]
    Restart=on-failure
    ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
    [Install]
    WantedBy=multi-user.target
    

    启动服务

    root@master3:/usr/local/alertmanager# systemctl start alertmanager.service
    root@master3:/usr/local/alertmanager# systemctl enable alertmanager.service
    Created symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service.
    

    web 测试访问9093端口

    配置prometheus报警规则

    root@master2:/usr/local/prometheus# cat prometheus.yml 
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - 10.203.104.22:9093   #alertmanager地址
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      - "/usr/local/prometheus/danran_rule.yml"   #指定规则文件
      # - "first_rules.yml"
      # - "second_rules.yml"
    

    创建报警规则文件

    root@master2:/usr/local/prometheus# cat danran_rule.yml 
    groups:
      - name: danran_pod.rules
        rules:
        - alert: Pod_all_cpu_usage
          expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 75
          for: 5m
          labels:
            severity: critical
            service: pods
          annotations:
            description: 容器 {{ $labels.name }} CPU 资源利用率大于 75% , (current value is {{ $value }})
            summary: Dev CPU 负载告警
    
        - alert: Pod_all_memory_usage
          expr: sort_desc(avg by(name)(irate(container_memory_usage_bytes{name!=""}[5m]))*100) > 1024*10^3*2
          for: 10m
          labels:
            severity: critical
          annotations:
            description: 容器 {{ $labels.name }} Memory 资源利用率大于 2G , (current value is {{ $value }})
            summary: Dev Memory 负载告警
    
        - alert: Pod_all_network_receive_usage
          expr: sum by (name)(irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1024*1024*50
          for: 10m
          labels:
            severity: critical
          annotations:
            description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }})
    

    报警规则验证

    root@master2:/usr/local/prometheus# ./promtool check rules danran_rule.yml
    Checking danran_rule.yml
      SUCCESS: 3 rules found
    

    重启prometheus

    root@master2:/usr/local/prometheus# systemctl restart prometheus
    

    验证报警规则匹配

    10.203.104.22为alertmanager主机

    root@master3:/usr/local/alertmanager# ./amtool alert --alertmanager.url=http://10.203.104.22:9093
    

    prometheus首页状态

    prometheus web界面验证报警规则

    prometheus监控haproxy

    haproxy_exporter安装在HA1(10.203.104.30)节点上

    部署haproxy_exporter

    root@ha1:/usr/local/src# ls
    haproxy_exporter-0.10.0.linux-amd64.tar.gz
    root@ha1:/usr/local/src# tar -zxvf haproxy_exporter-0.10.0.linux-amd64.tar.gz 
    haproxy_exporter-0.10.0.linux-amd64/
    haproxy_exporter-0.10.0.linux-amd64/LICENSE
    haproxy_exporter-0.10.0.linux-amd64/NOTICE
    haproxy_exporter-0.10.0.linux-amd64/haproxy_exporter
    
    root@ha1:/usr/local/src# ln -sv /usr/local/src/haproxy_exporter-0.10.0.linux-amd64 /usr/local/haproxy_exporter
    '/usr/local/haproxy_exporter' -> '/usr/local/src/haproxy_exporter-0.10.0.linux-amd64'
    root@ha1:/usr/local/src# cd /usr/local/haproxy_exporter
    

    启动haproxy_exporter

    root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock
    或指定haproxy的状态页启动
    root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter --haproxy.scrape-uri="http://haadmin:danran@10.203.104.30:9999/haproxy-status;csv"
    
    查看haproxy的状态页配置
    root@ha1:/usr/local/src# cat /etc/haproxy/haproxy.cfg
    listen stats
        mode http
        bind 0.0.0.0:9999
        stats enable
        log global
        stats uri /haproxy-status
        stats auth haadmin:danran
    

    编辑启动脚本

    root@ha1:~# cat /etc/systemd/system/haproxy-exporter.service
    [Unit]
    Description=Prometheus Haproxy Exporter
    After=network.target
    
    [Service]
    ExecStart=/usr/local/haproxy_exporter/haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock
    
    
    [Install]
    WantedBy=multi-user.target
    
    root@ha1:~# systemctl restart haproxy-exporter.service
    

    验证web界面数据

    prometheus server端添加haproxy数据采集

    root@master2:/usr/local/prometheus# cat prometheus.yml
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      #- job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
            # static_configs:
            #- targets: ['localhost:9090']
    
    - job_name: 'prometheus-node'
      static_configs:
      - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']
    
    - job_name: 'prometheus-master'
      static_configs:
      - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']
    
    - job_name: 'prometheus-pod'
      static_configs:
      - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']
    
    - job_name: 'prometheus-haproxy'
      static_configs:
      - targets: ['10.203.104.30:9101']
    

    重启prometheus

    root@master2~# systemctl restart prometheus
    

    grafana添加数据模板

    获取模板
    https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=haproxy
    

    在grafana中import 导入下载模版ID或JSON文件

    验证haproxy监控数据

  • 相关阅读:
    oracle11g 卸载和安装(win7,32位)
    MySQL忘记密码解决办法
    GPIO硬件资源的申请,内核空间和用户空间的数据交换,ioctl(.....),设备文件的自动创建
    模块参数,系统调用,字符设备编程重要数据结构,设备号的申请与注册,关于cdev的API
    开发环境的搭建,符合导出,打印优先级阈值
    定时器中断
    Linux系统移植的重要文件
    linux 相关指令
    linux各文件夹含义和作用
    外部中断实验
  • 原文地址:https://www.cnblogs.com/JevonWei/p/13188038.html
Copyright © 2020-2023  润新知