• k8s系列---资源指标API及自定义指标API


     https://www.linuxea.com/2112.html

    以前是用heapster来收集资源指标才能看,现在heapster要废弃了。

        从k8s v1.8开始后,引入了新的功能,即把资源指标引入api。 

        资源指标:metrics-server 

        自定义指标: prometheus,k8s-prometheus-adapter 

        因此,新一代架构: 

        1) 核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;cpu累计利用率、内存实时利用率、pod的资源占用率及容器的磁盘占用率 

        2) 监控流水线:用于从系统收集各种指标数据并提供终端用户、存储系统以及HPA,他们包含核心指标以及许多非核心指标。非核心指标不能被k8s所解析。 

        metrics-server是个api server,仅仅收集cpu利用率、内存利用率等。

    [root@master ~]# kubectl api-versions
    admissionregistration.k8s.io/v1beta1
    apiextensions.k8s.io/v1beta1
    apiregistration.k8s.io/v1
    apiregistration.k8s.io/v1beta1
    apps/v1
    apps/v1beta1
    apps/v1beta2
    authentication.k8s.io/v1
    authentication.k8s.io/v1beta1
    authorization.k8s.io/v1
    

      

     访问 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server  获取yaml文件,但这个里面的yaml文件更新了。和视频内的有差别

    贴出我修改后的yaml文件,留作备用

    [root@master metrics-server]# cat auth-delegator.yaml 
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: metrics-server:system:auth-delegator
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:auth-delegator
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    cat auth-delegator.yaml
    [root@master metrics-server]# cat auth-reader.yaml 
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: metrics-server-auth-reader
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: extension-apiserver-authentication-reader
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    auth-reader.yaml
    [root@master metrics-server]# cat metrics-apiservice.yaml 
    apiVersion: apiregistration.k8s.io/v1beta1
    kind: APIService
    metadata:
      name: v1beta1.metrics.k8s.io
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      service:
        name: metrics-server
        namespace: kube-system
      group: metrics.k8s.io
      version: v1beta1
      insecureSkipTLSVerify: true
      groupPriorityMinimum: 100
      versionPriority: 100
    metrics-apiservice.yaml

    关键是这个文件

    [root@master metrics-server]# cat metrics-server-deployment.yaml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: metrics-server-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      NannyConfiguration: |-
        apiVersion: nannyconfig/v1alpha1
        kind: NannyConfiguration
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: metrics-server-v0.3.1
      namespace: kube-system
      labels:
        k8s-app: metrics-server
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v0.3.1
    spec:
      selector:
        matchLabels:
          k8s-app: metrics-server
          version: v0.3.1
      template:
        metadata:
          name: metrics-server
          labels:
            k8s-app: metrics-server
            version: v0.3.1
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
            seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
        spec:
          priorityClassName: system-cluster-critical
          serviceAccountName: metrics-server
          containers:
          - name: metrics-server
            image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1
            command:
            - /metrics-server
            - --metric-resolution=30s
            - --kubelet-insecure-tls
            - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
            # These are needed for GKE, which doesn't support secure communication yet.
            # Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
            #- --kubelet-port=10250
            #- --deprecated-kubelet-completely-insecure=true
    
            ports:
            - containerPort: 443
              name: https
              protocol: TCP
          - name: metrics-server-nanny
            image: mirrorgooglecontainers/addon-resizer:1.8.4
            resources:
              limits:
                cpu: 100m
                memory: 300Mi
              requests:
                cpu: 5m
                memory: 50Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            volumeMounts:
            - name: metrics-server-config-volume
              mountPath: /etc/config
            command:
              - /pod_nanny
              - --config-dir=/etc/config
              - --cpu=100m
              - --extra-cpu=0.5m
              - --memory=100Mi
              - --extra-memory=50Mi
              - --threshold=5
              - --deployment=metrics-server-v0.3.1
              - --container=metrics-server
              - --poll-period=300000
              - --estimator=exponential
              # Specifies the smallest cluster (defined in number of nodes)
              #           # resources will be scaled to.
              - --minClusterSize=10
    
          volumes:
            - name: metrics-server-config-volume
              configMap:
                name: metrics-server-config
          tolerations:
            - key: "CriticalAddonsOnly"
              operator: "Exists"
    metrics-server-deployment.yaml
    [root@master metrics-server]# cat metrics-server-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/cluster-service: "true"
        kubernetes.io/name: "Metrics-server"
    spec:
      selector:
        k8s-app: metrics-server
      ports:
      - port: 443
        protocol: TCP
        targetPort: https
    metrics-server-service.yaml
    [root@master metrics-server]# cat metrics-server-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/cluster-service: "true"
        kubernetes.io/name: "Metrics-server"
    spec:
      selector:
        k8s-app: metrics-server
      ports:
      - port: 443
        protocol: TCP
        targetPort: https
    [root@master metrics-server]# cat resource-reader.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: system:metrics-server
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    rules:
    - apiGroups:
      - ""
      resources:
      - pods
      - nodes
      - namespaces
      - nodes/stats
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "extensions"
      resources:
      - deployments
      verbs:
      - get
      - list
      - update
      - watch
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: system:metrics-server
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:metrics-server
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    metrics-server-service.yaml

    如果从github上下载以上文件apply出错,就用上面的metrics-server-deployment.yaml文件,删掉重新apply一下就可以了

    [root@master metrics-server]# kubectl apply -f ./
    

      

    [root@master ~]#  kubectl proxy --port=8080
    

      

    确保metrics-server-v0.3.1-76b796b-4xgvp是running状态,我当时出现了Error发现是yaml里面有问题,最后该掉running了,该来该去该到上面的最终版

    [root@master metrics-server]# kubectl get pods -n kube-system
    NAME                                    READY   STATUS    RESTARTS   AGE
    canal-mgbc2                             3/3     Running   12         3d23h
    canal-s4xgb                             3/3     Running   23         3d23h
    canal-z98bc                             3/3     Running   15         3d23h
    coredns-78d4cf999f-5shdq                1/1     Running   0          6m4s
    coredns-78d4cf999f-xj5pj                1/1     Running   0          5m53s
    etcd-master                             1/1     Running   13         17d
    kube-apiserver-master                   1/1     Running   13         17d
    kube-controller-manager-master          1/1     Running   19         17d
    kube-flannel-ds-amd64-8xkfn             1/1     Running   0          <invalid>
    kube-flannel-ds-amd64-t7jpc             1/1     Running   0          <invalid>
    kube-flannel-ds-amd64-vlbjz             1/1     Running   0          <invalid>
    kube-proxy-ggcbf                        1/1     Running   11         17d
    kube-proxy-jxksd                        1/1     Running   11         17d
    kube-proxy-nkkpc                        1/1     Running   12         17d
    kube-scheduler-master                   1/1     Running   19         17d
    kubernetes-dashboard-76479d66bb-zr4dd   1/1     Running   0          <invalid>
    metrics-server-v0.3.1-76b796b-4xgvp     2/2     Running   0          9s
    

      

    查看出错日志 -c指定容器名,该pod内有两个容器,metrcis-server只是其中一个,另一个查询方法一样,把名字改掉即可

    [root@master metrics-server]# kubectl logs metrics-server-v0.3.1-76b796b-4xgvp   -c metrics-server -n kube-system
    

      

    大致出错的日志内容如下几条;

    403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)
    
    E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host
    
    
    no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )
    
    
    E1109 09:54:49.509521       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]
    

      

    当时我按照网上的方法尝试修改coredns配置,结果搞的日志出现获取所有pod都unable,如下,然后又取消掉了修改,删掉了coredns,让他自己重新生成了俩新的coredns容器

    - --kubelet-insecure-tls这种方式是禁用tls验证,一般不建议在生产环境中使用。并且由于DNS是无法解析到这些主机名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP进行规避。还有另外一种方法,修改coredns,不过,我并不建议这样做。

    参考这篇:https://github.com/kubernetes-incubator/metrics-server/issues/131

    metrics-server unable to fetch pdo metrics for pod
    

      

    以上为遇到的问题,反正用我上面的yaml绝对保证解决以上所有问题。还有那个flannel改了directrouting之后为啥每次重启集群机器,他就失效呢,我不得不在删掉flannel然后重新生成,这个问题前面文章写到了。

    此时执行如下命令就都成功了,item里也有值了

    [root@master ~]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1
    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "nodes",
          "singularName": "",
          "namespaced": false,
          "kind": "NodeMetrics",
          "verbs": [
            "get",
            "list"
          ]
        },
        {
          "name": "pods",
          "singularName": "",
          "namespaced": true,
          "kind": "PodMetrics",
          "verbs": [
            "get",
            "list"
          ]
        }
      ]
    

      

    [root@master metrics-server]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 14868    0 14868    0     0  1521k      0 --:--:-- --:--:-- --:--:-- 1613k
    {
      "kind": "PodMetricsList",
      "apiVersion": "metrics.k8s.io/v1beta1",
      "metadata": {
        "selfLink": "/apis/metrics.k8s.io/v1beta1/pods"
      },
      "items": [
        {
          "metadata": {
            "name": "pod1",
            "namespace": "prod",
            "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1",
            "creationTimestamp": "2019-01-29T02:39:12Z"
          },
    

      

    [root@master metrics-server]# kubectl top pods
    NAME                CPU(cores)   MEMORY(bytes)   
    filebeat-ds-4llpp   1m           2Mi             
    filebeat-ds-dv49l   1m           5Mi             
    myapp-0             0m           1Mi             
    myapp-1             0m           2Mi             
    myapp-2             0m           1Mi             
    myapp-3             0m           1Mi             
    myapp-4             0m           2Mi    
    

      

    [root@master metrics-server]# kubectl top nodes
    NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
    master   206m         5%     1377Mi          72%       
    node1    88m          8%     534Mi           28%       
    node2    78m          7%     935Mi           49% 
    

      

    自定义指标(prometheus)

        大家看到,我们的metrics已经可以正常工作了。不过,metrics只能监控cpu和内存,对于其他指标如用户自定义的监控指标,metrics就无法监控到了。这时就需要另外一个组件叫prometheus。

        prometheus的部署非常麻烦。

        node_exporter是agent;

        PromQL相当于sql语句来查询数据; 

        k8s-prometheus-adapter:prometheus是不能直接解析k8s的指标的,需要借助k8s-prometheus-adapter转换成api

        kube-state-metrics是用来整合数据的。

        下面开始部署。

        访问 https://github.com/ikubernetes/k8s-prom

    [root@master pro]# git clone https://github.com/iKubernetes/k8s-prom.git
    

      

    先创建一个叫prom的名称空间: 

    [root@master k8s-prom]# kubectl apply -f namespace.yaml 
    namespace/prom created
    

      

     部署node_exporter: 

    [root@master k8s-prom]# cd node_exporter/
    [root@master node_exporter]# ls
    node-exporter-ds.yaml  node-exporter-svc.yaml
    [root@master node_exporter]# kubectl apply -f .
    daemonset.apps/prometheus-node-exporter created
    service/prometheus-node-exporter created
    

      

    [root@master node_exporter]# kubectl get pods -n prom
    NAME                             READY     STATUS    RESTARTS   AGE
    prometheus-node-exporter-dmmjj   1/1       Running   0          7m
    prometheus-node-exporter-ghz2l   1/1       Running   0          7m
    prometheus-node-exporter-zt2lw   1/1       Running   0          7m
    

      

        部署prometheus: 

    [root@master k8s-prom]# cd prometheus/
    [root@master prometheus]# ls
    prometheus-cfg.yaml  prometheus-deploy.yaml  prometheus-rbac.yaml  prometheus-svc.yaml
    [root@master prometheus]# kubectl apply -f .
    configmap/prometheus-config created
    deployment.apps/prometheus-server created
    clusterrole.rbac.authorization.k8s.io/prometheus created
    serviceaccount/prometheus created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus created
    service/prometheus created
    

      

    看prom名称空间中的所有资源: pod/prometheus-server-76dc8df7b-hw8xc  处于 Pending   状态,日志显示内存不足

     [root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom  
    Warning  FailedScheduling  2m52s (x2 over 2m52s)  default-scheduler  0/3 nodes are available: 3 Insufficient memory.
    

      

    修改prometheus-deploy.yaml,删掉内存那三行

            resources:
              limits:
                memory: 2Gi
    

      

    重新apply

    [root@master prometheus]# kubectl apply -f prometheus-deploy.yaml
    

      

    [root@master prometheus]# kubectl get all -n prom
    NAME                                     READY     STATUS    RESTARTS   AGE
    pod/prometheus-node-exporter-dmmjj       1/1       Running   0          10m
    pod/prometheus-node-exporter-ghz2l       1/1       Running   0          10m
    pod/prometheus-node-exporter-zt2lw       1/1       Running   0          10m
    pod/prometheus-server-65f5d59585-6l8m8   1/1       Running   0          55s
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   56s
    service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         10m
    NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          10m
    NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/prometheus-server   1         1         1            1           56s
    NAME                                           DESIRED   CURRENT   READY     AGE
    replicaset.apps/prometheus-server-65f5d59585   1         1         1         56s
    

      

    上面我们看到通过NodePorts的方式,可以通过宿主机的30090端口,来访问prometheus容器里面的应用。 

        最好挂载个pvc的存储,要不这些监控数据过一会就没了。 

        部署kube-state-metrics,用来整合数据:  

    [root@master k8s-prom]# cd kube-state-metrics/
    [root@master kube-state-metrics]# ls
    kube-state-metrics-deploy.yaml  kube-state-metrics-rbac.yaml  kube-state-metrics-svc.yaml
    [root@master kube-state-metrics]# kubectl apply -f .
    deployment.apps/kube-state-metrics created
    serviceaccount/kube-state-metrics created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    service/kube-state-metrics created
    

      

    [root@master kube-state-metrics]# kubectl get all -n prom
    NAME                                      READY     STATUS    RESTARTS   AGE
    pod/kube-state-metrics-58dffdf67d-v9klh   1/1       Running   0          14m
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         14m
    

      

    部署k8s-prometheus-adapter,这个需要自制证书:

    [root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/
    [root@master pki]# (umask 077; openssl genrsa -out serving.key 2048)
    Generating RSA private key, 2048 bit long modulus
    ...........................................................................................+++
    ...............+++
    e is 65537 (0x10001)
    

      

        证书请求: 

    [root@master pki]#  openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
    

      

        开始签证: 

    [root@master pki]# openssl  x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
    Signature ok
    subject=/CN=serving
    Getting CA Private Key
    

      

        创建加密的配置文件: 

    [root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key  -n prom
    secret/cm-adapter-serving-certs created
    

      

        注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件里面的名字。

    [root@master pki]# kubectl get secrets -n prom
    NAME                             TYPE                                  DATA      AGE
    cm-adapter-serving-certs         Opaque                                2         51s
    default-token-knsbg              kubernetes.io/service-account-token   3         4h
    kube-state-metrics-token-sccdf   kubernetes.io/service-account-token   3         3h
    prometheus-token-nqzbz           kubernetes.io/service-account-token   3         3h
    

      

      部署k8s-prometheus-adapter:

    [root@master k8s-prom]# cd k8s-prometheus-adapter/
    [root@master k8s-prometheus-adapter]# ls
    custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml   custom-metrics-apiserver-service.yaml
    custom-metrics-apiserver-auth-reader-role-binding.yaml              custom-metrics-apiservice.yaml
    custom-metrics-apiserver-deployment.yaml                            custom-metrics-cluster-role.yaml
    custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml  custom-metrics-resource-reader-cluster-role.yaml
    custom-metrics-apiserver-service-account.yaml                       hpa-custom-metrics-cluster-role-binding.yaml
    

      

     由于k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解决办法就是访问https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下载最新版的custom-metrics-apiserver-deployment.yaml文件,并把里面的namespace的名字改成prom;同时还要下载custom-metrics-config-map.yaml文件到本地来,并把里面的namespace的名字改成prom。

    [root@master k8s-prometheus-adapter]# kubectl apply -f .
    clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created
    rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created
    deployment.apps/custom-metrics-apiserver created
    clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader created
    serviceaccount/custom-metrics-apiserver created
    service/custom-metrics-apiserver created
    apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
    clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created
    clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created
    clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created
    

      

    [root@master k8s-prometheus-adapter]# kubectl get all -n prom
    NAME                                           READY     STATUS    RESTARTS   AGE
    pod/custom-metrics-apiserver-65f545496-64lsz   1/1       Running   0          6m
    pod/kube-state-metrics-58dffdf67d-v9klh        1/1       Running   0          4h
    pod/prometheus-node-exporter-dmmjj             1/1       Running   0          4h
    pod/prometheus-node-exporter-ghz2l             1/1       Running   0          4h
    pod/prometheus-node-exporter-zt2lw             1/1       Running   0          4h
    pod/prometheus-server-65f5d59585-6l8m8         1/1       Running   0          4h
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/custom-metrics-apiserver   ClusterIP   10.103.87.246   <none>        443/TCP          36m
    service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         4h
    service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   4h
    service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         4h
    NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          4h
    NAME                                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/custom-metrics-apiserver   1         1         1            1           36m
    deployment.apps/kube-state-metrics         1         1         1            1           4h
    deployment.apps/prometheus-server          1         1         1            1           4h
    NAME                                                  DESIRED   CURRENT   READY     AGE
    replicaset.apps/custom-metrics-apiserver-5f6b4d857d   0         0         0         36m
    replicaset.apps/custom-metrics-apiserver-65f545496    1         1         1         6m
    replicaset.apps/custom-metrics-apiserver-86ccf774d5   0         0         0         17m
    replicaset.apps/kube-state-metrics-58dffdf67d         1         1         1         4h
    replicaset.apps/prometheus-server-65f5d59585          1         1         1         4h
    

      

      最终看到prom名称空间里面的所有资源都是running状态了。 

    [root@master k8s-prometheus-adapter]# kubectl api-versions
    custom.metrics.k8s.io/v1beta1
    

      

      可以看到custom.metrics.k8s.io/v1beta1这个api了。我那没看到上面这个东西,但是不影响使用

      开个代理: 

    [root@master k8s-prometheus-adapter]# kubectl proxy --port=8080
    

      

         可以看到指标数据了:

    [root@master pki]# curl  http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
     {
          "name": "pods/ceph_rocksdb_submit_transaction_sync",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
        {
          "name": "jobs.batch/kube_deployment_created",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
        {
          "name": "jobs.batch/kube_pod_owner",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
    

      

      下面我们就可以愉快的创建HPA了(水平Pod自动伸缩)。

        另外,prometheus还可以和grafana整合。如下步骤。

        先下载文件grafana.yaml,访问https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml

    [root@master pro]# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml
    

      

        修改grafana.yaml文件内容:

    把namespace: kube-system改成prom,有两处;
     把env里面的下面两个注释掉:
            - name: INFLUXDB_HOST
              value: monitoring-influxdb
     在最有一行加个type: NodePort
     ports:
      - port: 80
        targetPort: 3000
      selector:
        k8s-app: grafana
      type: NodePort
    

      

    [root@master pro]# kubectl apply -f grafana.yaml 
    deployment.extensions/monitoring-grafana created
    service/monitoring-grafana created
    

      

    [root@master pro]# kubectl get pods -n prom
    NAME                                       READY     STATUS    RESTARTS   AGE
    monitoring-grafana-ffb4d59bd-gdbsk         1/1       Running   0          5s
    

      

    如果还有问题就删掉上面的那几个,重新在apply一下

        看到grafana这个pod运行起来了。 

    [root@master pro]# kubectl get svc -n prom
    NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    monitoring-grafana         NodePort    10.106.164.205   <none>        80:32659/TCP     19m
    

      

     我们可以访问宿主机master ip: http://172.16.1.100:32659

     

    上图端口号是9090,根据自己svc实际端口去填写。除了把80 改成9090.其余不变,为什么是上面的格式,因为他们都处于一个名称空间内,可以通过服务名访问到的。

    [root@master pro]# kubectl get svc -n prom     
    NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    custom-metrics-apiserver   ClusterIP   10.109.58.249   <none>        443/TCP          52m
    kube-state-metrics         ClusterIP   10.103.52.45    <none>        8080/TCP         69m
    monitoring-grafana         NodePort    10.110.240.31   <none>        80:31128/TCP     17m
    prometheus                 NodePort    10.110.19.171   <none>        9090:30090/TCP   145m
    prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         146m
    

      

        然后,就能从界面上看到相应的数据了。 

        登录下面的网站下载个grafana监控k8s-prometheus的模板: https://grafana.com/dashboards/6417

        然后再grafana的界面中导入上面下载的模板: 

        导入模板之后,就能看到监控数据了: 

     HPA的没去实际操作,因为以前自己做过了,就不做了,直接复制过来,如有问题自己单独解决

    HPA(水平pod自动扩展) 

        当pod压力大了,会根据负载自动扩展Pod个数以均匀压力。 

        目前,HPA只支持两个版本,v1版本只支持核心指标的定义(只能根据cpu利用率的指标进行pod的扩展); 

    [root@master pro]# kubectl explain hpa.spec.scaleTargetRef
    scaleTargetRef:表示基于什么指标来计算pod伸缩的标准
    

      

    [root@master pro]# kubectl api-versions |grep auto
    autoscaling/v1
    autoscaling/v2beta1
    

      

        上面看到分别支持hpav1和hpav2。 

        下面我们用命令行的方式重新创建一个带有资源限制的pod myapp: 

    [root@master ~]# kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80
    service/myapp created
    deployment.apps/myapp created
    

      

    [root@master ~]# kubectl get pods
    NAME                     READY     STATUS    RESTARTS   AGE
    myapp-6985749785-fcvwn   1/1       Running   0          58s
    

      

        下面我们让myapp 这个pod能自动水平扩展,用kubectl autoscale,其实就是指明HPA控制器的。 

    [root@master ~]# kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60
    horizontalpodautoscaler.autoscaling/myapp autoscaled
    

      

     --min:表示最小扩展pod的个数 

        --max:表示最多扩展pod的个数 

        --cpu-percent:cpu利用率 

    [root@master ~]# kubectl get hpa
    NAME      REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    myapp     Deployment/myapp   0%/60%    1         8         1          4m
    

      

    [root@master ~]# kubectl get svc
    NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    myapp        ClusterIP   10.105.235.197   <none>        80/TCP              19
    

      

        下面我们把service改成NodePort的方式:

    [root@master ~]# kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}'
    service/myapp patched
    

      

    [root@master ~]# kubectl get svc
    NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    myapp        NodePort    10.105.235.197   <none>        80:31990/TCP        22m
    

      

    [root@master ~]# yum install httpd-tools #主要是为了安装ab压测工具
    

      

    [root@master ~]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
    myapp-6985749785-fcvwn   1/1       Running   0          25m       10.244.2.84   node2
    

      

        开始用ab工具压测 

    [root@master ~]# ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html
    This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    Benchmarking 172.16.1.100 (be patient)
    

      

        多等一会,会看到pods的cpu利用率为98%,需要扩展为2个pod了: 

    [root@master ~]# kubectl describe hpa
    resource cpu on pods  (as a percentage of request):  98% (49m) / 60%
    Deployment pods:                                       1 current / 2 desired
    

      

    [root@master ~]# kubectl top pods
    NAME                     CPU(cores)   MEMORY(bytes)   
    myapp-6985749785-fcvwn   49m (我们设置的总cpu是50m)         3Mi
    

      

    [root@master ~]#  kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-fcvwn   1/1       Running   0          32m       10.244.2.84    node2
    myapp-6985749785-sr4qv   1/1       Running   0          2m        10.244.1.105   node1
    

      

        上面我们看到已经自动扩展为2个pod了,再等一会,随着cpu压力的上升,还会看到自动扩展为4个或更多的pod: 

    [root@master ~]#  kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-2mjrd   1/1       Running   0          1m        10.244.1.107   node1
    myapp-6985749785-bgz6p   1/1       Running   0          1m        10.244.1.108   node1
    myapp-6985749785-fcvwn   1/1       Running   0          35m       10.244.2.84    node2
    myapp-6985749785-sr4qv   1/1       Running   0          5m        10.244.1.105   node1
    

      

        等压测一停止,pod个数还会收缩为正常个数的。

        上面我们用的是hpav1来做的水平pod自动扩展的功能,我们前面也说过,hpa v1版本只能根据cpu利用率括水平自动扩展pod。 

        下面我们介绍一下hpa v2的功能,它可以根据自定义指标利用率来水平扩展pod。 

        在使用hpa v2版本前,我们先把前面创建的hpa v1版本删除了,以免和我们测试的hpa v2版本冲突: 

    [root@master hpa]# kubectl delete hpa myapp
    horizontalpodautoscaler.autoscaling "myapp" deleted
    

      

    好了,下面我们创建一个hpa v2: 

    [root@master hpa]# cat hpa-v2-demo.yaml 
    apiVersion: autoscaling/v2beta1   #从这可以看出是hpa v2版本
    kind: HorizontalPodAutoscaler
    metadata:
      name: myapp-hpa-v2
    spec:
      scaleTargetRef: #根据什么指标来做评估压力
        apiVersion: apps/v1 #对谁来做自动扩展
        kind: Deployment
        name: myapp
      minReplicas: 1 #最少副本数量
      maxReplicas: 10
      metrics: #表示依据哪些指标来进行评估
      - type: Resource #表示基于资源进行评估
        resource: 
          name: cpu
          targetAverageUtilization: 55 #表示pod cpu使用率超过55%,就自动水平扩展pod个数
      - type: Resource
        resource:
          name: memory #我们知道hpa v1版本只能根据cpu来进行评估,而到了我们的hpa v2版本就可以根据内存来进行评估了
          targetAverageValue: 50Mi #表示pod内存使用超过50M,就自动水平扩展pod个数
    

      

    [root@master hpa]# kubectl apply -f hpa-v2-demo.yaml 
    horizontalpodautoscaler.autoscaling/myapp-hpa-v2 created
    

      

    [root@master hpa]# kubectl get hpa
    NAME           REFERENCE          TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
    myapp-hpa-v2   Deployment/myapp   3723264/50Mi, 0%/55%   1         10        1          37s
    

      

        我们看到现在只有一个pod 

    [root@master hpa]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
    myapp-6985749785-fcvwn   1/1       Running   0          57m       10.244.2.84   node2
    

      

        开始压测: 

    [root@master ~]# ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html
    

      

        看hpa v2的检测情况: 

    [root@master hpa]# kubectl describe hpa
    Metrics:                                               ( current / target )
      resource memory on pods:                             3756032 / 50Mi
      resource cpu on pods  (as a percentage of request):  82% (41m) / 55%
    Min replicas:                                          1
    Max replicas:                                          10
    Deployment pods:                                       1 current / 2 desired
    

      

    [root@master hpa]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-8frq4   1/1       Running   0          1m        10.244.1.109   node1
    myapp-6985749785-fcvwn   1/1       Running   0          1h        10.244.2.84    node2
    

      

      看到自动扩展出了2个Pod。等压测一停止,pod个数还会收缩为正常个数的。 

        将来我们不光可以用hpa v2,根据cpu和内存使用率进行伸缩Pod个数,还可以根据http并发量等。 

        比如下面的: 

    [root@master hpa]# cat hpa-v2-custom.yaml 
    apiVersion: autoscaling/v2beta1  #从这可以看出是hpa v2版本
    kind: HorizontalPodAutoscaler
    metadata:
      name: myapp-hpa-v2
    spec:
      scaleTargetRef: #根据什么指标来做评估压力
        apiVersion: apps/v1 #对谁来做自动扩展
        kind: Deployment
        name: myapp
      minReplicas: 1 #最少副本数量
      maxReplicas: 10
      metrics: #表示依据哪些指标来进行评估
      - type: Pods #表示基于资源进行评估
        pods: 
          metricName: http_requests#自定义的资源指标
            targetAverageValue: 800m #m表示个数,表示并发数800
    

      

    关于并发数的hpa,具体镜像可以参考https://hub.docker.com/r/ikubernetes/metrics-app/

  • 相关阅读:
    A1052. Linked List Sorting (25)
    A1032. Sharing (25)
    A1022. Digital Library (30)
    A1071. Speech Patterns (25)
    A1054. The Dominant Color (20)
    A1060. Are They Equal (25)
    A1063. Set Similarity (25)
    电子码表
    矩阵键盘
    对象追踪、临时对象追踪、绝对坐标与相对坐标
  • 原文地址:https://www.cnblogs.com/dribs/p/10332957.html
Copyright © 2020-2023  润新知