• linux运维、架构之路-K8s故障排查


    一、kubernetes故障排查

    1、应用程序故障排查

    主要针对Pod级别的,

           非running状态时使用describe查看Pod事件进行问题排查。describe也可以查看其他资源对象事件,如deployment、service等。

     kubectl describe TYPE/NAME

    [root@k8s-master ~]# kubectl describe pod web 
    Name:         web
    Namespace:    default
    Priority:     0
    Node:         k8s-node1/192.168.56.62
    Start Time:   Wed, 16 Dec 2020 14:43:55 +0800
    Labels:       <none>
    Annotations:  cni.projectcalico.org/podIP: 10.244.36.81/32
                  cni.projectcalico.org/podIPs: 10.244.36.81/32
    Status:       Pending
    IP:           
    IPs:          <none>
    Containers:
      nginx:
        Container ID:   
        Image:          nginx
        Image ID:       
        Port:           80/TCP
        Host Port:      0/TCP
        State:          Waiting
          Reason:       ContainerCreating
        Ready:          False
        Restart Count:  0
        Environment:    <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-c87dr (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      default-token-c87dr:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-c87dr
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type    Reason     Age        From                Message
      ----    ------     ----       ----                -------
      Normal  Scheduled  <unknown>  default-scheduler   Successfully assigned default/web to k8s-node1
      Normal  Pulling    11s        kubelet, k8s-node1  Pulling image "nginx"

    kubectl logs TYPE/NAME [-c CONTAINER]:Apiserver调用kubelet的接口获取

    [root@k8s-master ~]# kubectl logs web 
    /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
    /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
    /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
    10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
    10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
    /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
    /docker-entrypoint.sh: Configuration complete; ready for start up

    kubectl exec POD [-c CONTAINER] --COMMAND [args...],一个Pod中有多个容器时,使用-c指定容器的名称。

    ②pod处于pending状态可能的原因

    • 下载镜像
    • 可能node节点资源不足
    • 没有匹配到节点标签
    • 有污点

    2、管理节点异常排查

    集群架构图

    ①kubeadm部署

     除kubelet服务外,其他组件均采用静态Pod启动。

    [root@k8s-master ~]# kubectl get pods -n kube-system 
    NAME                                       READY   STATUS    RESTARTS   AGE
    calico-kube-controllers-59877c7fb4-z2bms   1/1     Running   2          105d
    calico-node-pnjxq                          1/1     Running   1          105d
    calico-node-v48jq                          1/1     Running   1          105d
    coredns-7ff77c879f-dqk8t                   1/1     Running   1          105d
    coredns-7ff77c879f-j8zsp                   1/1     Running   1          105d
    etcd-k8s-master                            1/1     Running   1          105d
    kube-apiserver-k8s-master                  1/1     Running   1          105d
    kube-controller-manager-k8s-master         1/1     Running   6          105d
    kube-proxy-ck88h                           1/1     Running   1          105d
    kube-proxy-hkb9f                           1/1     Running   1          105d
    kube-scheduler-k8s-master                  1/1     Running   6          105d
    metrics-server-8fcfb55ff-wlw5s             1/1     Running   3          104d

    其他服务配置文件路径:/etc/kubernetes/manifests

    [root@k8s-master ~]# ll /etc/kubernetes/manifests/
    总用量 16
    -rw------- 1 root root 1887 9月   1 17:04 etcd.yaml
    -rw------- 1 root root 2738 9月   1 17:04 kube-apiserver.yaml
    -rw------- 1 root root 2594 9月   1 17:04 kube-controller-manager.yaml
    -rw------- 1 root root 1149 9月   1 17:04 kube-scheduler.yaml

    通过组件服务及进程、证书等区别k8s集群部署方式

    [root@k8s-master ~]# systemctl status kube-apiserver.service
    Unit kube-apiserver.service could not be found.    #说明非二进制部署
    [root@k8s-master ~]# ps aux|grep apiserver         #kubeadm部署的证书路径都是特定的形式
    root       1696  6.1 19.0 635004 386360 ?       Ssl  10:01  30:04 kube-apiserver --advertise-address=192.168.56.61 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    1001       3837  0.0  1.2 138732 26048 ?        Ssl  10:04   0:17 /dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates --namespace=kubernetes-dashboard --tls-key-file=apiserver.key --tls-cert-file=apiserver.crt
    root      87035  0.0  0.0 112724   980 pts/1    S+   18:09   0:00 grep --color=auto apiserver

    修改静态Pod配置文件路径

    [root@k8s-master ~]# tail /var/lib/kubelet/config.yaml 
    imageMinimumGCAge: 0s
    kind: KubeletConfiguration
    nodeStatusReportFrequency: 0s
    nodeStatusUpdateFrequency: 0s
    rotateCertificates: true
    runtimeRequestTimeout: 0s
    staticPodPath: /etc/kubernetes/manifests
    streamingConnectionIdleTimeout: 0s
    syncFrequency: 0s
    volumeStatsAggPeriod: 0s

    ②二进制部署

    所有组件均采用systemd管理

    [root@k8s-node1 ~]# systemctl status kube-apiserver.service 
    ● kube-apiserver.service - Kubernetes API Server
       Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
       Active: active (running) since Mon 2020-04-20 15:26:41 CST; 7 months 27 days ago
         Docs: https://github.com/kubernetes/kubernetes
     Main PID: 17587 (kube-apiserver)
        Tasks: 36
       Memory: 356.5M
       CGroup: /system.slice/kube-apiserver.service
               └─17587 /app/kubernetes/bin/kube-apiserver --logtostderr=false --v=2 --log-dir=/app/kubernetes/logs --etcd-...
    
    Dec 16 16:22:11 k8s-node1 kube-apiserver[17587]: E1216 16:22:11.216916   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 16:38:14 k8s-node1 kube-apiserver[17587]: E1216 16:38:14.231035   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 16:51:27 k8s-node1 kube-apiserver[17587]: E1216 16:51:27.296324   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 17:04:51 k8s-node1 kube-apiserver[17587]: E1216 17:04:51.356825   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 17:20:04 k8s-node1 kube-apiserver[17587]: E1216 17:20:04.464772   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 17:28:03 k8s-node1 kube-apiserver[17587]: E1216 17:28:03.551942   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 17:38:01 k8s-node1 kube-apiserver[17587]: E1216 17:38:01.568538   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 17:52:41 k8s-node1 kube-apiserver[17587]: E1216 17:52:41.593466   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 18:01:48 k8s-node1 kube-apiserver[17587]: E1216 18:01:48.620521   17587 watcher.go:214] watch chan error: ...acted
    Dec 16 18:16:43 k8s-node1 kube-apiserver[17587]: E1216 18:16:43.655648   17587 watcher.go:214] watch chan error: ...acted
    Hint: Some lines were ellipsized, use -l to show in full.

    服务配置文件路径:/usr/lib/systemd/system

    ③管理节点组件

    • kube-apiserver
    • kube-controller-manager
    • kube-scheduler

    3、工作节点异常排查

    ①管理节点组件

    • kubelet           #调用容器引擎接口管理容器,并将容器运行状态上报给apiserver。
    • kube-proxy    #实现Pod的负载均衡和服务发现,根据访问的请示,转发到后面的一组Pod。

    ②node是not ready状态可能原因

    • kubelet服务启动有问题
    • kubelet与apiserver网络不通
    • kubelet携带证书有问题,例如过期
    • node节点磁盘空间满了

     kubelet服务未启动处理

    systemctl start kubelet && systemctl enable kubelet

    kubelet服务无法启动处理

    journalctl -u kubelet  #查看日志排查处理
    journalctl -u kubelet.service >kubelet.log  #输出到文件中排查

     4、Service访问异常排查

    ①用户通过NodePort访问service流程

     client -> kube-proxy监听一个端口,接受流量会被iptables/ipvs处理 -> 一组pod(分散每个节点)

    [root@k8s-node1 ~]# kubectl get svc -n kube-system 
    NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
    grafana                   NodePort    10.0.0.202   <none>        3000:9006/TCP                258d
    
    [root@k8s-node1 ~]# iptables-save |grep 9006
    -A KUBE-NODEPORTS -p tcp -m comment --comment "kube-system/grafana:" -m tcp --dport 9006 -j KUBE-MARK-MASQ
    -A KUBE-NODEPORTS -p tcp -m comment --comment "kube-system/grafana:" -m tcp --dport 9006 -j KUBE-SVC-3QDDWNGGGXWDZXKH

    ②查看Pod和Service是否运行正常

    [root@k8s-master ~]# kubectl get pods -o wide
    NAME                   READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
    web-5dcb957ccc-96nbn   1/1     Running   0          10m   10.244.36.93   k8s-node1   <none>           <none>
    web-5dcb957ccc-j5sz7   1/1     Running   0          10m   10.244.36.66   k8s-node1   <none>           <none>
    [root@k8s-master ~]# kubectl get svc
    NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
    kubernetes    ClusterIP   10.96.0.1      <none>        443/TCP        106d
    web-service   NodePort    10.99.239.53   <none>        80:31100/TCP   10m

    ③查看Service是否正常关联到Pod

    [root@k8s-master ~]# kubectl get ep
    NAME          ENDPOINTS                         AGE
    kubernetes    192.168.56.61:6443                106d
    web-service   10.244.36.66:80,10.244.36.93:80   9m43s

    ④Service指定target-port是否正确

    [root@k8s-master ~]# kubectl exec  -it web-5dcb957ccc-96nbn -- netstat -lntp
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1/nginx: master pro 
    tcp6       0      0 :::80                   :::*                    LISTEN      1/nginx: master pro

    ⑤无法访问Service其他原因

    • Service是否通过DNS工作?
    • kube-proxy正常工作吗?
    • kube-proxy是否正常写iptables规则?
    • cni网络插件是否正常工作?
  • 相关阅读:
    SVN的具体使用方法介绍(安装以及操作)
    谈谈敏捷开发
    008.MVC与数据库的交互
    001.MVC基本概述
    WebApi接口测试工具
    014.存储过程(sql实例)
    我们为什么需要DTO(数据传输对象)
    ASP.NET使用WebApi接口实现与Android客户端的交互(图片或字符串的接收与回传)
    面试题库[2]
    关于单例模式getInstance()的使用
  • 原文地址:https://www.cnblogs.com/yanxinjiang/p/14144019.html
Copyright © 2020-2023  润新知