• Kubernetes1.20.1 下 部署Prometheus+nodeexporter+Grafana+AlertManager 监控系统


    第一章节:写在前面的话:

    当下的需求:当前k8s监控方案大多都是Prometheus+node-exporter+Grafana+AlertManager来实现,然而网上可参考的资料都是将这套监控部署在k8s集群内。

    那么现在有个新的需求:监控方案还是用Prometheus+node-exporter+Grafana+AlertManager来实现,由于某些原因想将这套监控部署在k8s集群外。

    1、监控部署在k8s集群内 参考这个链接:《https://bbs.huaweicloud.com/blogs/detail/303137》,之前参考这个文档也部署出来了下面的效果图。

    2、监控部署在k8s集群外 参考下面笔者写的部署方案。 

    第二章节:基础信息

    服务名 版本号 下载地址
    Kubernetes 1.20.6 使用的是云厂商提供的
    Prometheus 2.27.1

    https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz

    node-exporter 1.1.2  
    Grafana 8.3.1

    https://dl.grafana.com/oss/release/grafana-8.3.0.linux-amd64.tar.gz

    AlertManager 0.22.2

    https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz

    prometheus-webhook-dingtalk 1.4.0

    https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

     

    第三章节:Grafana展示 

    1、Grafana 效果展示 1

    http://191.10.10.10:3000/d/9CWBz0bik/uk8sjian-kong-da-ping-master-nodezi-yuan-xiang-qing?orgId=1

    2、Grafana 效果展示 2

    http://191.10.10.10:3000/d/PwMJtdvnz/1-k8s-for-prometheus-dashboard-20211010?orgId=1

     3、Grafana 效果展示 3

     

    3、Grafana 效果展示 3

    第四章节 Prometheus

    1、Prometheus 介绍

    2、Prometheus 部署

    # 下载 

    wget -c https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz

    # 解压 

    tar xvfz prometheus-*.tar.gz

    cd prometheus-*

     3、Prometheus 配置文件

    cat prometheus.yml

    global:
      scrape_interval:     15s # By default, scrape targets every 15 seconds.
    
      # Attach these labels to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        monitor: 'codelab-monitor'
    
    scrape_configs:
      - job_name: 'prometheus_server'
        static_configs:
        - targets: ['192.168.9.50:9090']
    
      - job_name: 'web_status'
        metrics_path: /probe
        params:
          module: [http_2xx]  # Look for a HTTP 200 response.
        static_configs:
          - targets:
            - http://prometheus.io    # Target to probe with http.
            - https://prometheus.io   # Target to probe with https.
            - http://example.com:8080 # Target to probe with http on port 8080.
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: 192.168.14.150:9115  # The blackbox exporter's real hostname:port.
    
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: endpoints
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        scheme: https
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
        # 过滤default下服务名为kubernetes的元数据
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: default;kubernetes;https
          replacement: $1
          target_label: __address__
          action: keep
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 192.168.14.150:6443
          action: replace
    
      - job_name: 'kubernetes-scheduler'
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: endpoints
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        scheme: http
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
        # 过滤default下服务名为kubernetes的元数据
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: default;kubernetes;https
          replacement: $1
          target_label: __address__
          action: keep
        - source_labels: [__address__]
          separator: ;
          regex: '(.*):6443'
          target_label: __address__
          replacement: '${1}:10251'
          action: replace
    
      - job_name: 'kubernetes-controller-manager'
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: endpoints
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        scheme: http
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
        # 过滤default下服务名为kubernetes的元数据
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: default;kubernetes;https
          replacement: $1
          target_label: __address__
          action: keep
        - source_labels: [__address__]
          separator: ;
          regex: '(.*):6443'
          target_label: __address__
          replacement: '${1}:10252'
          action: replace
    
      - job_name: 'kubernetes-node-all'
        metrics_path: /metrics
        scheme: http
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: node
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
        - source_labels: [__address__]
          regex: '(.*):10250'
          replacement: '${1}:9100'
          target_label: __address__
          action: replace
    
        #- action: labelmap
        #  regex: __meta_kubernetes_node_label_(.+)
    
      - job_name: kubernetes-node-kubelet
        metrics_path: /metrics
        scheme: http
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: node
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - source_labels: [__address__]
          regex: '(.*):10250'
          replacement: '${1}:10255'
          target_label: __address__
          action: replace
    
      - job_name: kubernetes-cadvisor
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: node
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 192.168.14.150:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          action: replace
        metric_relabel_configs:
        - source_labels: [instance]
          separator: ;
          regex: (.+)
          target_label: node
          replacement: $1
          action: replace
    
      - job_name: 'kubernetes-pods'
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: pod
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true
        tls_config:
          insecure_skip_verify: true 
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - source_labels: [__address__]
          separator: ;
          regex: '.*:(.*)'
          target_label: __pod_port__
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name, __pod_port__]
          separator: ;
          regex: (.*);(.*);(.*)
          target_label: __metrics_path__
          replacement: /api/v1/namespaces/$1/pods/$2:$3/proxy/metrics
          action: replace
        - source_labels: [__address__]
          separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 192.168.14.150:6443
          action: replace
    
      - job_name: 'kubernetes-service-endpoints'
        scheme: http
        kubernetes_sd_configs:
        - api_server: https://192.168.14.150:6443/
          role: endpoints
          bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
          tls_config:
            insecure_skip_verify: true
        tls_config:
          insecure_skip_verify: true 
        bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true  
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta__kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta__kubernetes_service_name]
          action: replace
          target_label: kubernetes_service_name
    
    
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ['192.168.9.50:9093']
    
    rule_files:
       - "/data/monitor/prometheus-2.27.1.linux-amd64/rules/*yml"
       #- "second_rules.yml"

    4、prometheus 配置自启动

    cat  /etc/systemd/system/prometheus.service

    [Unit]
    Description=prometheus
    After=network.target
    [Service]
    Type=simple
    User=root
    ExecStart=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus --config.file=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/data/monitor/prometheus-2.27.1.linux-amd64/data
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target

    systemctl daemon-reload

    systemctl restart prometheus.service

    systemctl enable prometheus.service

    systemctl status  prometheus.service 

    第五章节

    1、node-exporter 介绍

    2、node-exporter 部署

    3、node-exporter 配置

    4、 配置自启动 

    第六章节 Grafana

    1、Grafana 介绍

    2、Grafana 部署

    wget -c https://dl.grafana.com/oss/release/grafana-8.0.3.linux-amd64.tar.gz
    tar -zxvf grafana-8.0.3.linux-amd64.tar.gz
    cd grafana-8.0.3/

    3、Grafana 配置

    4、 配置自启动 

    cat /etc/systemd/system/grafana.service

    [Unit]
    Description=grafana_service
    After=network.target
    [Service]
    Type=simple
    User=root
    ExecStart=/data/monitor/grafana/bin/grafana-server -homepath /data/monitor/grafana
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target

    systemctl daemon-reload

    systemctl enable grafana.service

    systemctl start grafana.service

    systemctl status grafana.service

     

    第七章节 AlertManager

    1、AlertManager 介绍

    2、AlertManager 部署

    wget -c https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz
    tar xf alertmanager-0.22.2.linux-amd64.tar.gz
    cd alertmanager-0.22.2.linux-amd64/

    3、AlertManager 配置

    4、 配置自启动 

    vim /etc/systemd/system/alertmanager.service

    [Unit]
    Description=alertmanager
    After=network.target
    [Service]
    Type=simple
    User=root
    ExecStart=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager --config.file=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager.yml --storage.path="/data/monitor/alertmanager-0.22.2.linux-amd64/data/" --log.format=logfmt 
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target 

    systemctl daemon-reload

    systemctl enable alertmanager.service

    systemctl restart alertmanager.service

    systemctl status  alertmanager.service

    第八章节 prometheus-webhook-dingtalk

    1、prometheus-webhook-dingtalk 介绍

    2、prometheus-webhook-dingtalk 部署

    wget -c https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
    tar xf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
    cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/

    3、prometheus-webhook-dingtalk 配置

    cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/
    cp config.example.yml config.yml

    cat config.yml

    ## Request timeout
    # timeout: 5s
    
    ## Customizable templates path
    templates:
      - /data/monitor/prometheus-webhook-dingtalk-1.4.0/contrib/templates/legacy/default2.tmpl
    ## You can also override default template using `default_message`
    ## The following example to use the 'legacy' template from v0.3.0
    # default_message:
    #   title: '{{ template "legacy.title" . }}'
    #   text: '{{ template "legacy.content" . }}'
    
    ## Targets, previously was known as "profiles"
    targets:
      test:
        url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
      webhook_legacy:
        url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
        # Customize template content
        message:
          # Use legacy template
          title: '{{ template "legacy.title" . }}'
          text: '{{ template "legacy.content" . }}'
      webhook_mention_all:
        url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
        mention:
          all: true
      webhook_mention_users:
        url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
        mention:
          mobiles: ['156xxxx8827', '189xxxx8325']

    cat contrib/templates/legacy/template.tmpl

    {{ define "ding.link.content2" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}
    
    {{ define "__text_alert_list" }}{{ range . }}
    **Labels**
    {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Annotations**
    {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
    {{ end }}{{ end }}
    
    {{/* 故障告警 */}}
    
    {{ define "default.__text_alert_list" }}{{ range . }}
    
    
    **————————————————**
    
    **报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
    
    **告警类型:** {{ .Annotations.summary }}
    
    **主机名称:** {{ .Labels.instance}}
    
    **告警详情:** {{ .Annotations.description }}
    
    **————————————————**
    {{ end }}
    {{ end }}
    
    
    
    
    {{/* 报警恢复 */}}
    
    {{ define "default.__text_resolved_list" }}{{ range . }}
    
    
    **————————————————**
    
    **报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
    
    **恢复时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
    
    **主机名称:** {{ .Labels.instance}}
    
    **告警详情:** {{ .Annotations.description }}
    
    **————————————————**
    
    
    {{ end }}
    {{ end }}
    
    
    {{/* Default */}}
    {{ define "default.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ if gt (len .Alerts.Firing) 0 -}}
    
    ![Firing-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEFTSeNBSwZrpOeKH*wfJrUH*bq5wvFpRL5ZUVtNN73JYtEhtV4He5iNFDbVZLe.S1dtnf6OeIiVqbCOthMY0Pv0!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4)
    
    **故障报警**
    {{ template "default.__text_alert_list" .Alerts.Firing }}
    {{- end }}
    {{ if gt (len .Alerts.Resolved) 0 -}}
    
    ![Resolved-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEEdthRxYCYVef54h2YlrRZXxd9Y8aCW30HAv53MXawIp2uL7ClzTjC76hjfa5R6buAPPGk9X35.sPY4Z0GWE0Z4!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4)
    
    **报警恢复**
    {{ template "default.__text_resolved_list" .Alerts.Resolved }}
    {{- end }}
    {{- end }}
    
    {{/* Legacy */}}
    {{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{- end }}
    
    {{/* Following names for compatibility */}}
    {{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
    {{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }} 

    4、 配置自启动 

    cat /etc/systemd/system/prometheus-webhook-dingtalk.service

    [Unit]
    Description=prometheus-webhook-dingtalk
    After=network-online.target
    
    [Service]
    Restart=on-failure
    ExecStart=/data/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxx"
    
    [Install]
    WantedBy=multi-user.target

    systemctl daemon-reload

    systemctl enable prometheus-webhook-dingtalk.service

    systemctl restart prometheus-webhook-dingtalk.service

    systemctl status prometheus-webhook-dingtalk.service

    tail -200f /var/log/messages

  • 相关阅读:
    转: SSH框架总结(框架分析+环境搭建+实例源码下载)
    转:ClickOnce部署Winform程序的方方面面
    转:在决定使用ClickOnce发布你的软件前,应该知道的一些事情(一些常见问题解决方法)
    转: c#.net利用RNGCryptoServiceProvider产生任意范围强随机数的办法
    转:winform 安装包(很详细)
    转:c# WinForm开发 DataGridView控件的各种操作总结(单元格操作,属性设置)
    转:C# WinForm窗体及其控件的自适应
    转:c# 安装包制作
    转:socket
    MyEclipse 智能提示设置
  • 原文地址:https://www.cnblogs.com/suyj/p/15876541.html
Copyright © 2020-2023  润新知