一、自带Metrics接口类型服务的监控
有些应用本身具有Metrics接口,我们可以使用Prometheus Operator来创建相应的servicemonitor,匹配该服务的service,就能自动将该服务纳入监控中。而有些服务本身没有创建service或者是kubernetes集群外部的应用,我们首先需要为其创建service和endpoint。
在kubernetes中,使用Prometheus监控etcd集群,二进制安装的k8s集群中,etcd集群不在k8s集群内部,首先为etcd集群创建一个service和endpoint。
etcd-endpoint.yaml
apiVersion: v1 kind: Endpoints metadata: name: kube-etcd-monitoring namespace: kube-system labels: k8s-app: kube-etcd subsets: - addresses: - ip: 192.168.10.240 - ip: 192.168.10.241 - ip: 192.168.10.242 ports: - name: https-metrics port: 2379 protocol: TCP
接下来创建etcd的service服务,名称要和endpoint一致,这样就能联系上endpoint
etcd-service.yaml
apiVersion: v1 kind: Service metadata: name: kube-etcd-monitoring namespace: kube-system labels: k8s-app: kube-etcd spec: ports: - port: 2379 name: https-metrics protocol: TCP type: ClusterIP
创建
kubectl create -f .
查看
# kubectl get svc,ep -n kube-system -l k8s-app=kube-etcd NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-etcd-monitoring ClusterIP 10.111.191.23 <none> 2379/TCP 4m53s NAME ENDPOINTS AGE endpoints/kube-etcd-monitoring 192.168.10.240:2379,192.168.10.241:2379,192.168.10.242:2379 4m53s
测试,连接etcd需要证书
# curl --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem https://10.111.191.23:2379/metrics -k |more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP etcd_cluster_version Which version is running. 1 for 'cluster_version' label with current cluster version # TYPE etcd_cluster_version gauge etcd_cluster_version{cluster_version="3.4"} 1 # HELP etcd_debugging_auth_revision The current revision of auth store. # TYPE etcd_debugging_auth_revision gauge etcd_debugging_auth_revision 1 # HELP etcd_debugging_disk_backend_commit_rebalance_duration_seconds The latency distributions of commit.rebalance called by bboltdb backend. # TYPE etcd_debugging_disk_backend_commit_rebalance_duration_seconds histogram etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.001"} 152605 etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.002"} 152609 etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.004"} 152610 etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.008"} 152610 etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.016"} 152612 etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.032"
二、修改Prometheus
因为etcd访问需要使用证书,所以需要在prometheus中挂载etcd的证书,允许其访问etcd
创建一个secret
# kubectl create secret generic etcd-cert -n monitoring --from-file=/etc/etcd/ssl/etcd-ca.pem --from-file=/etc/etcd/ssl/etcd.pem --from-file=/etc/etcd/ssl/etcd-key.pem secret/etcd-cert created
修改prometheus的yaml文件
$path/kube-prometheus/manifests/prometheus-prometheus.yaml
... podMonitorSelector: {} probeNamespaceSelector: {} probeSelector: {} secrets: - etcd-cert replicas: 1 resources: requests: memory: 300Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules ...
- 新增配置项:secrets
- etcd的证书默认挂载在:/etc/prometheus/secrets/etcd-cert
三、创建servicemonitor
prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: kube-etcd name: kube-etcd namespace: monitoring spec: endpoints: - interval: 30s port: https-metrics scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-cert/etcd-ca.pem certFile: /etc/prometheus/secrets/etcd-cert/etcd.pem keyFile: /etc/prometheus/secrets/etcd-cert/etcd-key.pem insecureSkipVerify: true jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: kube-etcd
- k8s-app:需要和etcd的service一致
创建
kubectl create -f prometheus-serviceMonitorEtcd.yaml
查看Prometheus