• prometheus添加自定义监控与告警(etcd为例)


    一、步骤及注意事项(前提,部署参考部署篇)

    1. 一般etcd集群会开启HTTPS认证,因此访问etcd需要对应的证书
    2. 使用证书创建etcd的secret
    3. 将etcd的secret挂在到prometheus
    4. 创建etcd的servicemonitor对象(匹配kube-system空间下具有k8s-app=etcd标签的service)
    5. 创建service关联被监控对象

    二、实际操作步骤(etcd证书默认路径:/etc/kubernetes/pki/etcd/)

    1、创建etcd的secret

    cd /etc/kubernetes/pki/etcd/
    kubectl create secret generic etcd-certs --from-file=healthcheck-client.crt --from-file=healthcheck-client.key --from-file=ca.crt -n monitoring

    2、添加secret到名为k8s的prometheus对象上(kubectl edit prometheus k8s -n monitoring或者修改yaml文件并更新资源)

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      labels:
        prometheus: k8s
      name: k8s
      namespace: monitoring
    spec:
      alerting:
        alertmanagers:
        - name: alertmanager-main
          namespace: monitoring
          port: web
      baseImage: quay.io/prometheus/prometheus
      nodeSelector:
        kubernetes.io/os: linux
      podMonitorNamespaceSelector: {}
      podMonitorSelector: {}
      replicas: 2
      secrets:
      - etcd-certs
      resources:
        requests:
          memory: 400Mi
      ruleSelector:
        matchLabels:
          prometheus: k8s
          role: alert-rules
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccountName: prometheus-k8s
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector: {}
      version: v2.11.0

    3、创建servicemonitoring对象

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: etcd-k8s
      namespace: monitoring
      labels:
        k8s-app: etcd-k8s
    spec:
      jobLabel: k8s-app
      endpoints:
      - port: port
        interval: 30s
        scheme: https
        tlsConfig:
          caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
          certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
          keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
          insecureSkipVerify: true
      selector:
        matchLaels:
          k8s-app: etcd
      namespaceSelector:
        matchNames:
        - kube-system

    4、创建service并自定义endpoint(考虑到etcd可能部署在kubernetes集群外,因此自定义endpoint)

    apiVersion: v1
    kind: Service
    metadata:
      name: etcd-k8s
      namespace: kube-system
      labels:
        k8s-app: etcd
    spec:
      type: ClusterIP
      clusterIP: None
      ports:
      - name: port
        port: 2379
        protocol: TCP
    
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: etcd-k8s
      namespace: kube-system
      labels:
        k8s-app: etcd
    subsets:
    - addresses:
      - ip: 1.1.1.11
    -
    ip: 1.1.1.12
    - ip: 1.1.1.13
        nodeName: etcd-master
      ports:
      - name: port
        port: 2379
        protocol: TCP

    此处正常能通过prometheus的页面看到对应的监控信息了

    若监控中出现报错:connection refused,修改/etc/kubernetes/manifests下的etcd.yaml文件

    方法一:--listen-client-urls=https://0.0.0.0:2379

    方法二:--listen-client-urls=https://127.0.0.1:2379,https://1.1.1.11:2379

    三、创建自定义告警

    1. 创建一个prometheusRule资源后再prometheus的pod中会生成对应的告警配置文件
    2. 注意:此处的标签一定要匹配
    3. 告警项:若etcd集群有一半以上的节点可用,则认为集群可用,否则产生告警
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: k8s
        role: alert-rules
      name: etcd-rules
      namespace: monitoring
    spec:
      groups:
      - name: etcd-exporter.rules
        rules:
        - alert: EtcdClusterUnavailable
          annotations:
            summary: etcd cluster small
            description: If one more etcd peer goes down the cluster will be unavailable
          expr: |
            count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2-1)
          for: 3m
          labels:
            severity: critical
  • 相关阅读:
    JavaScript中的闭包
    SQL 备忘
    SqlServer 2005 升级至SP2过程中出现"身份验证"无法通过的问题
    unable to start debugging on the web server iis does not list an application that matches the launched url
    Freebsd 编译内核
    Freebsd 6.2中关于无线网络的设定
    【Oracle】ORA01219
    【Linux】Windows到Linux的文件复制
    【Web】jar命令行生成jar包
    【Linux】CIFS挂载Windows共享
  • 原文地址:https://www.cnblogs.com/jayce9102/p/12073559.html
Copyright © 2020-2023  润新知