第一章节:写在前面的话:
当下的需求:当前k8s监控方案大多都是Prometheus+node-exporter+Grafana+AlertManager来实现,然而网上可参考的资料都是将这套监控部署在k8s集群内。
那么现在有个新的需求:监控方案还是用Prometheus+node-exporter+Grafana+AlertManager来实现,由于某些原因想将这套监控部署在k8s集群外。
1、监控部署在k8s集群内 参考这个链接:《https://bbs.huaweicloud.com/blogs/detail/303137》,之前参考这个文档也部署出来了下面的效果图。
2、监控部署在k8s集群外 参考下面笔者写的部署方案。
第二章节:基础信息
服务名 | 版本号 | 下载地址 |
Kubernetes | 1.20.6 | 使用的是云厂商提供的 |
Prometheus | 2.27.1 |
https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz |
node-exporter | 1.1.2 | |
Grafana | 8.3.1 |
https://dl.grafana.com/oss/release/grafana-8.3.0.linux-amd64.tar.gz |
AlertManager | 0.22.2 |
https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz |
prometheus-webhook-dingtalk | 1.4.0 |
https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz |
第三章节:Grafana展示
1、Grafana 效果展示 1
http://191.10.10.10:3000/d/9CWBz0bik/uk8sjian-kong-da-ping-master-nodezi-yuan-xiang-qing?orgId=1
2、Grafana 效果展示 2
http://191.10.10.10:3000/d/PwMJtdvnz/1-k8s-for-prometheus-dashboard-20211010?orgId=1
3、Grafana 效果展示 3
3、Grafana 效果展示 3
第四章节 Prometheus
1、Prometheus 介绍
2、Prometheus 部署
# 下载
wget -c https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz
# 解压
tar xvfz prometheus-*.tar.gz
cd prometheus-*
3、Prometheus 配置文件
cat prometheus.yml
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor' scrape_configs: - job_name: 'prometheus_server' static_configs: - targets: ['192.168.9.50:9090'] - job_name: 'web_status' metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. static_configs: - targets: - http://prometheus.io # Target to probe with http. - https://prometheus.io # Target to probe with https. - http://example.com:8080 # Target to probe with http on port 8080. relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.14.150:9115 # The blackbox exporter's real hostname:port. - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: endpoints bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token scheme: https tls_config: insecure_skip_verify: true relabel_configs: # 过滤default下服务名为kubernetes的元数据 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: default;kubernetes;https replacement: $1 target_label: __address__ action: keep - separator: ; regex: (.*) target_label: __address__ replacement: 192.168.14.150:6443 action: replace - job_name: 'kubernetes-scheduler' kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: endpoints bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token scheme: http tls_config: insecure_skip_verify: true relabel_configs: # 过滤default下服务名为kubernetes的元数据 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: default;kubernetes;https replacement: $1 target_label: __address__ action: keep - source_labels: [__address__] separator: ; regex: '(.*):6443' target_label: __address__ replacement: '${1}:10251' action: replace - job_name: 'kubernetes-controller-manager' kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: endpoints bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token scheme: http tls_config: insecure_skip_verify: true relabel_configs: # 过滤default下服务名为kubernetes的元数据 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: default;kubernetes;https replacement: $1 target_label: __address__ action: keep - source_labels: [__address__] separator: ; regex: '(.*):6443' target_label: __address__ replacement: '${1}:10252' action: replace - job_name: 'kubernetes-node-all' metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: node bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:9100' target_label: __address__ action: replace #- action: labelmap # regex: __meta_kubernetes_node_label_(.+) - job_name: kubernetes-node-kubelet metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: node bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:10255' target_label: __address__ action: replace - job_name: kubernetes-cadvisor metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: node bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: 192.168.14.150:6443 action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor action: replace metric_relabel_configs: - source_labels: [instance] separator: ; regex: (.+) target_label: node replacement: $1 action: replace - job_name: 'kubernetes-pods' scheme: https kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: pod bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token relabel_configs: - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - source_labels: [__address__] separator: ; regex: '.*:(.*)' target_label: __pod_port__ replacement: $1 action: replace - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name, __pod_port__] separator: ; regex: (.*);(.*);(.*) target_label: __metrics_path__ replacement: /api/v1/namespaces/$1/pods/$2:$3/proxy/metrics action: replace - source_labels: [__address__] separator: ; regex: (.*) target_label: __address__ replacement: 192.168.14.150:6443 action: replace - job_name: 'kubernetes-service-endpoints' scheme: http kubernetes_sd_configs: - api_server: https://192.168.14.150:6443/ role: endpoints bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token tls_config: insecure_skip_verify: true tls_config: insecure_skip_verify: true bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta__kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta__kubernetes_service_name] action: replace target_label: kubernetes_service_name alerting: alertmanagers: - static_configs: - targets: ['192.168.9.50:9093'] rule_files: - "/data/monitor/prometheus-2.27.1.linux-amd64/rules/*yml" #- "second_rules.yml"
4、prometheus 配置自启动
cat /etc/systemd/system/prometheus.service
[Unit] Description=prometheus After=network.target [Service] Type=simple User=root ExecStart=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus --config.file=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/data/monitor/prometheus-2.27.1.linux-amd64/data Restart=on-failure [Install] WantedBy=multi-user.target
systemctl daemon-reload
systemctl restart prometheus.service
systemctl enable prometheus.service
systemctl status prometheus.service
第五章节
1、node-exporter 介绍
2、node-exporter 部署
3、node-exporter 配置
4、 配置自启动
第六章节 Grafana
1、Grafana 介绍
2、Grafana 部署
wget -c https://dl.grafana.com/oss/release/grafana-8.0.3.linux-amd64.tar.gz tar -zxvf grafana-8.0.3.linux-amd64.tar.gz cd grafana-8.0.3/
3、Grafana 配置
4、 配置自启动
cat /etc/systemd/system/grafana.service
[Unit] Description=grafana_service After=network.target [Service] Type=simple User=root ExecStart=/data/monitor/grafana/bin/grafana-server -homepath /data/monitor/grafana Restart=on-failure [Install] WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable grafana.service
systemctl start grafana.service
systemctl status grafana.service
第七章节 AlertManager
1、AlertManager 介绍
2、AlertManager 部署
wget -c https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz tar xf alertmanager-0.22.2.linux-amd64.tar.gz cd alertmanager-0.22.2.linux-amd64/
3、AlertManager 配置
4、 配置自启动
vim /etc/systemd/system/alertmanager.service
[Unit] Description=alertmanager After=network.target [Service] Type=simple User=root ExecStart=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager --config.file=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager.yml --storage.path="/data/monitor/alertmanager-0.22.2.linux-amd64/data/" --log.format=logfmt Restart=on-failure [Install] WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable alertmanager.service
systemctl restart alertmanager.service
systemctl status alertmanager.service
第八章节 prometheus-webhook-dingtalk
1、prometheus-webhook-dingtalk 介绍
2、prometheus-webhook-dingtalk 部署
wget -c https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz tar xf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/
3、prometheus-webhook-dingtalk 配置
cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/ cp config.example.yml config.yml
cat config.yml
## Request timeout # timeout: 5s ## Customizable templates path templates: - /data/monitor/prometheus-webhook-dingtalk-1.4.0/contrib/templates/legacy/default2.tmpl ## You can also override default template using `default_message` ## The following example to use the 'legacy' template from v0.3.0 # default_message: # title: '{{ template "legacy.title" . }}' # text: '{{ template "legacy.content" . }}' ## Targets, previously was known as "profiles" targets: test: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx webhook_legacy: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx # Customize template content message: # Use legacy template title: '{{ template "legacy.title" . }}' text: '{{ template "legacy.content" . }}' webhook_mention_all: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx mention: all: true webhook_mention_users: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx mention: mobiles: ['156xxxx8827', '189xxxx8325']
cat contrib/templates/legacy/template.tmpl
{{ define "ding.link.content2" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }} {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }} {{ define "__text_alert_list" }}{{ range . }} **Labels** {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }} {{ end }} **Annotations** {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }} {{ end }} **Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }}) {{ end }}{{ end }} {{/* 故障告警 */}} {{ define "default.__text_alert_list" }}{{ range . }} **————————————————** **报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} **告警类型:** {{ .Annotations.summary }} **主机名称:** {{ .Labels.instance}} **告警详情:** {{ .Annotations.description }} **————————————————** {{ end }} {{ end }} {{/* 报警恢复 */}} {{ define "default.__text_resolved_list" }}{{ range . }} **————————————————** **报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} **恢复时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }} **主机名称:** {{ .Labels.instance}} **告警详情:** {{ .Annotations.description }} **————————————————** {{ end }} {{ end }} {{/* Default */}} {{ define "default.title" }}{{ template "__subject" . }}{{ end }} {{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})** {{ if gt (len .Alerts.Firing) 0 -}} ![Firing-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEFTSeNBSwZrpOeKH*wfJrUH*bq5wvFpRL5ZUVtNN73JYtEhtV4He5iNFDbVZLe.S1dtnf6OeIiVqbCOthMY0Pv0!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4) **故障报警** {{ template "default.__text_alert_list" .Alerts.Firing }} {{- end }} {{ if gt (len .Alerts.Resolved) 0 -}} ![Resolved-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEEdthRxYCYVef54h2YlrRZXxd9Y8aCW30HAv53MXawIp2uL7ClzTjC76hjfa5R6buAPPGk9X35.sPY4Z0GWE0Z4!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4) **报警恢复** {{ template "default.__text_resolved_list" .Alerts.Resolved }} {{- end }} {{- end }} {{/* Legacy */}} {{ define "legacy.title" }}{{ template "__subject" . }}{{ end }} {{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})** {{ template "__text_alert_list" .Alerts.Firing }} {{- end }} {{/* Following names for compatibility */}} {{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }} {{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
4、 配置自启动
cat /etc/systemd/system/prometheus-webhook-dingtalk.service
[Unit] Description=prometheus-webhook-dingtalk After=network-online.target [Service] Restart=on-failure ExecStart=/data/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxx" [Install] WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable prometheus-webhook-dingtalk.service
systemctl restart prometheus-webhook-dingtalk.service
systemctl status prometheus-webhook-dingtalk.service
tail -200f /var/log/messages