• prometheus(2)之对kubernetes的监控


    prometheus服务发现

    • 1.基于endpoints的service注释服务自动发现。
    • 2.基于pod注释的服务自动发现
    • 3.基于consul注册的服务自动发现
    • 4.手动配置服务发现
    • 5.pushgetway手动上传服务发现

    Prometheus对kubernetes的监控

    对于Kubernetes而言,我们可以把当中所有的资源分为几类:

    • 基础设施层(Node):集群节点,为整个集群和应用提供运行时资源
    • 容器基础设施(Container):为应用提供运行时环境
    • 用户应用(Pod):Pod中会包含一组容器,它们一起工作,并且对外提供一个(或者一组)功能
    • 内部服务负载均衡(Service):在集群内,通过Service在集群暴露应用功能,集群内应用和应用之间访问时提供内部的负载均衡
    • 外部访问入口(Ingress):通过Ingress提供集群外的访问入口,从而可以使外部客户端能够访问到部署在Kubernetes集群内的服务

    因此,如果要构建一个完整的监控体系,我们应该考虑,以下5个方面:

    • 集群节点状态监控:从集群中各节点的kubelet服务获取节点的基本运行状态;
    • 集群节点资源用量监控:通过Daemonset的形式在集群中各个节点部署Node Exporter采集节点的资源使用情况;
    • 节点中运行的容器监控:通过各个节点中kubelet内置的cAdvisor中获取个节点中所有容器的运行状态和资源使用情况;
    • 如果在集群中部署的应用程序本身内置了对Prometheus的监控支持,那么我们还应该找到相应的Pod实例,并从该Pod实例中获取其内部运行状态的监控指标。
    • 对k8s本身的组件做监控:apiserver、scheduler、controller-manager、kubelet、kube-proxy

    1. node-exporter介绍?

    node-exporter可以采集机器(物理机、虚拟机、云主机等)的监控指标数据,能够采集到的指标包括CPU, 内存,磁盘,网络,文件数等信息。

    安装node-exporter

    [root@xianchaomaster1 ~]# kubectl create ns monitor-sa
    把node-exporter.tar.gz镜像压缩包上传到k8s的各个节点,手动解压:
    [root@xianchaomaster1 ~]# docker load -i node-exporter.tar.gz
    [root@xianchaonode1 ~]# docker load -i node-exporter.tar.gz
    最好pull到本地传入镜像仓库 
    [root@node-1-172 tomcat]# docker tag prom/node-exporter:v0.16.0 172.17.166.217/kubenetes/node-exporter:v0.16.0
    
    docker push  172.17.166.217/kubenetes/node-exporter:v0.16.0
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: monitor-sa
      labels:
        name: node-exporter
    spec:
      selector:
        matchLabels:
         name: node-exporter
      template:
        metadata:
          labels:
            name: node-exporter
        spec:
          hostPID: true
          hostIPC: true
          hostNetwork: true
          # # hostNetwork、hostIPC、hostPID都为True时,表示这个Pod里的所有容器,会直接使用宿主机的网络,直接与宿主机进行IPC(进程间通信)通信,可以看到宿主机里正在运行的所有进程。
          # 加入了hostNetwork:true会直接将我们的宿主机的9100端口映射出来,从而不需要创建service 在我们的宿主机上就会有一个9100的端口
          containers:
          - name: node-exporter
            image: 172.17.166.217/kubenetes/node-exporter:v0.16.0
            ports:
            - containerPort: 9100
            resources:
              requests:
                cpu: 0.15
            securityContext:
              privileged: true
            # #开启特权模式
            args:
            - --path.procfs
            - /host/proc
            - --path.sysfs
            - /host/sys
            - --collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc)($|/)"'
            #通过正则表达式忽略某些文件系统挂载点的信息收集
            volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
          tolerations:
          - key: "node-role.kubernetes.io/master"  #对master节点 打污点容忍
            operator: "Exists"
            effect: "NoSchedule"
            ##将主机/dev、/proc、/sys这些目录挂在到容器中,这是因为我们采集的很多节点数据都是通过这些文件来获取系统信息的。
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: dev
              hostPath:
                path: /dev
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
    node-export.yaml

    node-export原理通过共享主机资源目录,容器实现对特定目录下文件的查看如cpuinfo等获取信息。

    #通过kubectl apply更新node-exporter.yaml文件
    [root@xianchaomaster1]# kubectl apply -f node-export.yaml
    #查看node-exporter是否部署成功
    [root@xianchaomaster1]# kubectl get pods -n monitor-sa
    显示如下,看到pod的状态都是running,说明部署成功
    NAME                  READY   STATUS    RESTARTS   AGE
    node-exporter-9qpkd   1/1     Running   0          89s
    node-exporter-zqmnk   1/1     Running   0          89s
    
    通过node-exporter采集数据
    curl  http://主机ip:9100/metrics
    
    #node-export默认的监听端口是9100,可以看到当前主机获取到的所有监控数据 
    
    curl http://192.168.40.180:9100/metrics | grep node_cpu_seconds
    显示192.168.40.180主机cpu的使用情况
    
    # HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
    # TYPE node_cpu_seconds_total counter
    node_cpu_seconds_total{cpu="0",mode="idle"} 72963.37
    node_cpu_seconds_total{cpu="0",mode="iowait"} 9.35
    node_cpu_seconds_total{cpu="0",mode="irq"} 0
    node_cpu_seconds_total{cpu="0",mode="nice"} 0
    node_cpu_seconds_total{cpu="0",mode="softirq"} 151.4
    node_cpu_seconds_total{cpu="0",mode="steal"} 0
    node_cpu_seconds_total{cpu="0",mode="system"} 656.12
    node_cpu_seconds_total{cpu="0",mode="user"} 267.1
    
    #HELP:解释当前指标的含义,上面表示在每种模式下node节点的cpu花费的时间,以s为单位
    #TYPE:说明当前指标的数据类型,上面是counter类型
    node_cpu_seconds_total{cpu="0",mode="idle"} :
    cpu0上idle进程占用CPU的总时间,CPU占用时间是一个只增不减的度量指标,从类型中也可以看出node_cpu的数据类型是counter(计数器)
    
    counter计数器:只是采集递增的指标
    
    
    curl http://192.168.40.180:9100/metrics | grep node_load
    # HELP node_load1 1m load average.
    # TYPE node_load1 gauge
    node_load1 0.1
    
    node_load1该指标反映了当前主机在最近一分钟以内的负载情况,系统的负载情况会随系统资源的使用而变化,因此node_load1反映的是当前状态,数据可能增加也可能减少,从注释中可以看出当前指标类型为gauge(标准尺寸)
    gauge标准尺寸:统计的指标可增加可减少

    Prometheus server安装和配置

    10.1 创建sa账号,对sa做rbac授权
    创建一个sa账号monitor
     kubectl create serviceaccount monitor -n monitor-sa  
    #把sa账号monitor通过clusterrolebing绑定到clusterrole上
    kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin  --serviceaccount=monitor-sa:monitor
    
    10.2 创建prometheus数据存储目录
    
    #在k8s集群的xianchaonode1节点上创建数据存储目录
     mkdir /data
     chmod 777 /data/

    创建一个configmap存储卷,用来存放prometheus配置信息

    ---
    kind: ConfigMap
    apiVersion: v1
    metadata:
      labels:
        app: prometheus
      name: prometheus-config
      namespace: monitor-sa
    data:
      prometheus.yml: |
        global: #全局配置
          scrape_interval: 15s #数据抓取时间
          scrape_timeout: 10s #抓取超时时间
          evaluation_interval: 1m #评估告警周期
        scrape_configs: #配置数据源
        - job_name: 'kubernetes-node' #target名称
          kubernetes_sd_configs: #k8s中服务发现
          - role: node  #使用的角色 node会使用kubelet默认的http端口来获取一些节点信息
          relabel_configs: #重新标记采集数据
          - source_labels: [__address__] #将默认采集到的source_loabels重新赋值address 作为一个endpoints
            regex: '(.*):10250' #将source_labels中的10250替换
            replacement: '${1}:9100' #9100替换为10250 
            target_label: __address__ #替换为ip:9100
            action: replace #动作替换
          - action: labelmap #匹配到下面正则表达式的标签会被保留
            regex: __meta_kubernetes_node_label_(.+) #保留这个标签
        - job_name: 'kubernetes-node-cadvisor'
          kubernetes_sd_configs:
          - role:  node
          scheme: https #定义协议
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt #定义ca证书
            #key_file: /etc/kubernetes/ssl/ca-key.pem
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token #token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)  #保留当前标签
          - target_label: __address__
            replacement: kubernetes.default.svc:443 #将原本地址转换为此地址
          - source_labels: [__meta_kubernetes_node_name] #定义标签
            regex: (.+) #正则任意内容
            target_label: __metrics_path__  #匹配到source_labels: [__meta_kubernetes_node_name]标签中的__metrics_path__ 
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor #替换为此地址
        - job_name: 'kubernetes-apiserver'
          kubernetes_sd_configs:
          - role: endpoints #基于k8s的服务发现 服务可以监控的一个指标
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            #key_file: /etc/kubernetes/ssl/ca-key.pem
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep #动作保留
            regex: default;kubernetes;https #匹配到这些保留
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep #保留
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace #替换
            target_label: __scheme__
            regex: (https?) #采集到带有https的字段替换为上方字段
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_labe l: kubernetes_name
    prometheus-cfg.yaml
    kubectl apply -f prometheus-cfg.yaml
    
    kubectl get configmap

    安装prometheus

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus-server
      namespace: monitor-sa
      labels:
        app: prometheus
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus
          component: server
        #matchExpressions:
        #- {key: app, operator: In, values: [prometheus]}
        #- {key: component, operator: In, values: [server]}
      template:
        metadata:
          labels:
            app: prometheus
            component: server
          annotations:
            prometheus.io/scrape: 'false' #打一个描述信息 在prometheus中定义拥有该描述信息不被抓取
        spec:
          #nodeName: node1 定义了node节点
          serviceAccountName: monitor
          containers:
          - name: prometheus
            image: 172.17.166.217/kubenetes/prometheus:v2.2.1
            imagePullPolicy: IfNotPresent #从本地进行安装 本地无则拉取
            command:
              - prometheus
              - --config.file=/etc/prometheus/prometheus.yml #配置文件路径 通过configmap 投射
              - --storage.tsdb.path=/prometheus #数据存放目录 
              - --storage.tsdb.retention=720h #默认删除时间
              - --web.enable-lifecycle #开启热加载
            ports:
            - containerPort: 9090
              protocol: TCP
            volumeMounts:
            - mountPath: /etc/prometheus/prometheus.yml
              name: prometheus-config
              subPath: prometheus.yml
            - mountPath: /prometheus/
              name: prometheus-storage-volume
          volumes:
            - name: prometheus-config
              configMap:
                name: prometheus-config
                items:
                  - key: prometheus.yml
                    path: prometheus.yml
                    mode: 0644
            - name: prometheus-storage-volume
              hostPath:
               path: /data
               type: Directory
    ~                                   
    prometheus-deploy.yaml
    kubectl apply -f prometheus-deploy.yaml
    
    kubectl get pods -n monitor-sa

    创建prometheus service(用于提供访问)

    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: monitor-sa
    spec:
      ports:
      - port: 9090
        protocol: TCP
        targetPort: 9090
      selector:
        app: prometheus
        component: server
      type: ClusterIP
    
    ---
    #ingress
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: prometheus
      namespace: monitor-sa
    spec:
      rules:
      - host: csk8s.mingcloud.net
        http:
          paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: prometheus
                port:
                  number: 9090
    prometheus-service.yaml
    kubectl get svc -n monitor-sa

    prometheus配置文件详解

    relabel_configs重写标签

    job_name:kubernetes-node

    kind: ConfigMap
    apiVersion: v1
    metadata:
      labels:
        app: prometheus
      name: prometheus-config
      namespace: monitor-sa
    data:
      prometheus.yml: |
        rule_files:
        - /etc/prometheus/rules.yml
        alerting:
          alertmanagers:
          - static_configs:
            - targets: ["localhost:9093"]
        global:
          scrape_interval: 15s
          scrape_timeout: 10s
          evaluation_interval: 1m
        scrape_configs:
        - job_name: 'kubernetes-node'
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
        - job_name: 'kubernetes-node-cadvisor'
          kubernetes_sd_configs:
          - role:  node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        - job_name: 'kubernetes-apiserver'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
        - job_name: 'kubernetes-schedule'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:10251','172.17.166.218:10251','172.17.166.219:10251']
        - job_name: 'kubernetes-controller-manager'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:10252','172.17.166.218:10252','172.17.166.219:10252']
        - job_name: 'kubernetes-kube-proxy'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.219:10249','172.17.27.255:10249','172.17.27.248:10249','172.17.4.79:10249']
        - job_name: 'pushgateway'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:9091']
          honor_labels: true
        - job_name: 'kubernetes-etcd'
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/ca.pem
            cert_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/kubernetes.pem
            key_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/kubernetes-key.pem
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.219:2379','172.17.4.79:2379','172.17.27.255:2379','172.17.27.248:2379']
    prometheus配置文件全部
    #scrape_configs:配置数据源,称为target,每个target用job_name命名。又分为静态配置和服务发现
    - job_name: 'kubernetes-node' kubernetes_sd_configs: #使用的是k8s的服务发现 - role: node # 使用node角色,它使用默认的kubelet提供的http端口来发现集群中每个node节点。 relabel_configs: #重新标记 - source_labels: [__address__] #配置的原始标签,匹配地址 regex: '(.*):10250' #匹配带有10250端口的url replacement: '${1}:9100' #把匹配到的ip:10250的ip保留 target_label: __address__ #新生成的url是${1}获取到的ip:9100 action: replace - action: labelmap #匹配到下面正则表达式的标签会被保留,如果不做regex正则的话,默认只是会显示instance标签 regex: __meta_kubernetes_node_label_(.+)

     

    注意:Before relabeling表示匹配到的所有标签
    instance="xianchaomaster1" 
    Before relabeling:  
    __address__="192.168.40.180:10250"
    __meta_kubernetes_node_address_Hostname="xianchaomaster1"
    __meta_kubernetes_node_address_InternalIP="192.168.40.180"
    __meta_kubernetes_node_annotation_kubeadm_alpha_kubernetes_io_cri_socket="/var/run/dockershim.sock"
    __meta_kubernetes_node_annotation_node_alpha_kubernetes_io_ttl="0"
    __meta_kubernetes_node_annotation_projectcalico_org_IPv4Address="192.168.40.180/24"
    __meta_kubernetes_node_annotation_projectcalico_org_IPv4IPIPTunnelAddr="10.244.123.64"
    __meta_kubernetes_node_annotation_volumes_kubernetes_io_controller_managed_attach_detach="true"
    __meta_kubernetes_node_label_beta_kubernetes_io_arch="amd64"
    __meta_kubernetes_node_label_beta_kubernetes_io_os="linux"
    __meta_kubernetes_node_label_kubernetes_io_arch="amd64"
    __meta_kubernetes_node_label_kubernetes_io_hostname="xianchaomaster1"
    __meta_kubernetes_node_label_kubernetes_io_os="linux"
    __meta_kubernetes_node_label_node_role_kubernetes_io_control_plane=""
    __meta_kubernetes_node_label_node_role_kubernetes_io_master=""
    __meta_kubernetes_node_name="xianchaomaster1"
    __metrics_path__="/metrics"
    __scheme__="http"
    instance="xianchaomaster1"
    job="kubernetes-node"

    node角色默认的获取地址为nodeip:10250端口,由于使用了node-export端口为9100,所以对原地址进行切割重新拼接。并将默认的__meta_kubernetes_node_label标签进行保留。

    job_name: kubernetes-node-cadvisor

    - job_name: 'kubernetes-node-cadvisor'
    # 抓取cAdvisor数据,是获取kubelet上/metrics/cadvisor接口数据来获取容器的资源使用情况
          kubernetes_sd_configs:
          - role:  node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap  #把匹配到的标签保留
            regex: __meta_kubernetes_node_label_(.+)
    #保留匹配到的具有__meta_kubernetes_node_label的标签
          - target_label: __address__  
    #获取到的地址:__address__="192.168.40.180:10250"
            replacement: kubernetes.default.svc:443 
    #把获取到的地址替换成新的地址kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
    #把原始标签中__meta_kubernetes_node_name值匹配到
            target_label: __metrics_path__
    #获取__metrics_path__对应的值
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    #把metrics替换成新的值api/v1/nodes/xianchaomaster1/proxy/metrics/cadvisor
    ${1}是__meta_kubernetes_node_name获取到的值
    
    新的url就是https://kubernetes.default.svc:443/api/v1/nodes/xianchaomaster1/proxy/metrics/cadvisor

    cadvisor用于获取容器资源指标,默认集成在kubelet metric中,通过正则拼接 使目标通过kubernetes.default.svc:443地址访问server-api的clusterIP *.*.0.1访问到后端server-api的api/v1/nodes/各个node名称/proxy/metrics/cadvisor来获取cadvisor

     job_name: kubernetes-apiserver

     - job_name: 'kubernetes-apiserver'
          kubernetes_sd_configs:
          - role: endpoints
    #使用k8s中的endpoint服务发现,采集apiserver 6443端口获取到的数据
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: __meta_kubernetes_namespace
    #endpoint这个对象的名称空间
    ,__meta_kubernetes_service_name  
    #endpoint对象的服务名
    , __meta_kubernetes_endpoint_port_name
    #exnpoint的端口名称]
            action: keep  #采集满足条件的实例,其他实例不采集
            regex: default;kubernetes;https

    #正则匹配到的默认空间下的service名字是kubernetes,协议是https的endpoint类型保留下来

    endpoints角色默认到endpoints下查找ip+6443端口

     对以下类型进行保留regex: default;kubernetes;https 就会查找到api-services ip及端口。

    job_name: kubernetes-service-endpoints

     - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
    # 重新打标仅抓取到的具有 "prometheus.io/scrape: true" 的annotation的端点,意思是说如果某个service具有prometheus.io/scrape = true annotation声明则抓取,annotation本身也是键值结构,所以这里的源标签设置为键,而regex设置值true,当值匹配到regex设定的内容时则执行keep动作也就是保留,其余则丢弃。
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
    #重新设置scheme,匹配源标签__meta_kubernetes_service_annotation_prometheus_io_scheme也就是prometheus.io/scheme annotation,如果源标签的值匹配到regex,则把值替换为__scheme__对应的值。
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
    # 应用中自定义暴露的指标,也许你暴露的API接口不是/metrics这个路径,那么你可以在这个POD对应的service中做一个"prometheus.io/path = /mymetrics" 声明,上面的意思就是把你声明的这个路径赋值给__metrics_path__,其实就是让prometheus来获取自定义应用暴露的metrices的具体路径,不过这里写的要和service中做好约定,如果service中这样写 prometheus.io/app-metrics-path: '/metrics' 那么你这里就要
    __meta_kubernetes_service_annotation_prometheus_io_app_metrics_path这样写。
    
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
    # 暴露自定义的应用的端口,就是把地址和你在service中定义的 "prometheus.io/port = <port>" 声明做一个拼接,然后赋值给__address__,这样prometheus就能获取自定义应用的端口,然后通过这个端口再结合__metrics_path__来获取指标,如果__metrics_path__值不是默认的/metrics那么就要使用上面的标签替换来获取真正暴露的具体路径。

          - action: labelmap  #保留下面匹配到的标签

            regex: __meta_kubernetes_service_label_(.+)

          - source_labels: [__meta_kubernetes_namespace]

            action: replace  #替换__meta_kubernetes_namespace变成kubernetes_namespace

            target_label: kubernetes_namespace

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: kubernetes_name

     通过对endpoints进行数据抓取,也就是说在service创建中要打上相应的注释对地址拼接,实现服务自动发现。

      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9121'

    job_name:  kubernetes-pods

        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape #匹配到以下标签的抓取
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path #匹配路径
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port #匹配端口     prometheus.io/scrape: 'true'    prometheus.io/port: '9121'
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)  #地址进行拼接
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace   #保留标签
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name

    原理与服务自动发现类似,调用pod角色通过pod注释信息动态采集。

    静态服务发现

        - job_name: 'kubernetes-schedule'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:10251','172.17.166.218:10251','172.17.166.219:10251']
        - job_name: 'kubernetes-controller-manager'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:10252','172.17.166.218:10252','172.17.166.219:10252']
        - job_name: 'kubernetes-kube-proxy'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.219:10249','172.17.27.255:10249','172.17.27.248:10249','172.17.4.79:10249']
        - job_name: 'pushgateway'
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.217:9091']
          honor_labels: true
        - job_name: 'kubernetes-etcd'
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/ca.pem
            cert_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/kubernetes.pem
            key_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/kubernetes-key.pem
          scrape_interval: 5s
          static_configs:
          - targets: ['172.17.166.219:2379','172.17.4.79:2379','172.17.27.255:2379','172.17.27.248:2379']

    prometheus热更新

     Prometheus热加载
    #为了每次修改配置文件可以热加载prometheus,也就是不停止prometheus,就可以使配置生效,想要使配置生效可用如下热加载命令:
    [root@xianchaomaster1 prometheus]# kubectl get pods -n monitor-sa -o wide -l app=prometheus
     
    
    #10.244.121.4是prometheus的pod的ip地址,如何查看prometheus的pod ip
    
    想要使配置生效可用如下命令热加载:
    [root@xianchaomaster1]#  curl -X POST http://10.244.121.4:9090/-/reload
    
    #热加载速度比较慢,可以暴力重启prometheus,如修改上面的prometheus-cfg.yaml文件之后,可执行如下强制删除:
    kubectl delete -f prometheus-cfg.yaml
    kubectl delete -f prometheus-deploy.yaml
    然后再通过apply更新:
    kubectl apply -f prometheus-cfg.yaml
    kubectl apply -f prometheus-deploy.yaml
    注意:
    线上最好热加载,暴力删除可能造成监控数据的丢失

    安装kube-state-metrics组件

     kube-state-metrics是什么?

    kube-state-metrics通过监听API Server生成有关资源对象的状态指标,比如Deployment、Node、Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等;调度了多少个replicas?现在可用的有几个?多少个Pod是running/stopped/terminated状态?Pod重启了多少次?我有多少job在运行中。

    安装kube-state-metrics组件

    (1)创建sa,并对sa授权

    在k8s的控制节点生成一个kube-state-metrics-rbac.yaml文件

    通过kubectl apply更新资源清单yaml文件

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kube-state-metrics
    rules:
    - apiGroups: [""]
      resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
      verbs: ["list", "watch"]
    - apiGroups: ["extensions"]
      resources: ["daemonsets", "deployments", "replicasets"]
      verbs: ["list", "watch"]
    - apiGroups: ["apps"]
      resources: ["statefulsets"]
      verbs: ["list", "watch"]
    - apiGroups: ["batch"]
      resources: ["cronjobs", "jobs"]
      verbs: ["list", "watch"]
    - apiGroups: ["autoscaling"]
      resources: ["horizontalpodautoscalers"]
      verbs: ["list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kube-state-metrics
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    kube-state-metrics-rbac.yaml

    2)安装kube-state-metrics组件

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kube-state-metrics
      template:
        metadata:
          labels:
            app: kube-state-metrics
        spec:
          serviceAccountName: kube-state-metrics
          containers:
          - name: kube-state-metrics
            image: 172.17.166.217/kubenetes/kube-state-metrics:v1.9.0
            ports:
            - containerPort: 8080
    kube-state-metrics-deploy.yaml

    3)创建service

    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
      name: kube-state-metrics
      namespace: kube-system
      labels:
        app: kube-state-metrics
    spec:
      ports:
      - name: kube-state-metrics
        port: 8080
        protocol: TCP
      selector:
        app: kube-state-metrics
    kube-state-metrics-svc.yaml

    通过注释来抓取数据annotations:prometheus.io/scrape: 'true

  • 相关阅读:
    springboot2.0整合logback日志(详细)
    关于Logstash中grok插件的正则表达式例子
    feign多文件上传
    HBase API(新版本)
    HBase基础知识
    Hive数据操作和数据查询
    Hive数据定义
    Hive基础知识
    Hive安装
    Spark词频统计,求TOP值,文件排序,二次排序
  • 原文地址:https://www.cnblogs.com/dahuige/p/15094764.html
Copyright © 2020-2023  润新知