• 排查 Kubernetes 集群无法加入 controlplane 的问题


    使用下面的命令将 kube-master1 作为 control-plane 加入 k8s 集群

    kubeadm join k8s-api:6443 \
      --token ****** \
      --discovery-token-ca-cert-hash ****** \
      --control-plane \
      --certificate-key *****
    

    加入 etcd 集群时卡住

    [etcd] Announced new etcd member joining to the existing etcd cluster
    [etcd] Creating static Pod manifest for "etcd"
    [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
    [kubelet-check] Initial timeout of 40s passed.
    

    在 /var/log/containers 中发现 etcd 的错误日志

    {
      "level": "warn",
      "ts": "2022-05-20T23:25:34.108Z",
      "caller": "etcdserver/cluster_util.go:79",
      "msg": "failed to get cluster response",
      "address": "https://10.0.9.171:2380/members",
      "error": "Get \"https://10.0.9.171:2380/members\": x509: certificate is valid for 10.0.1.81, 127.0.0.1, ::1, not 10.0.9.171"
    }
    

    从日志看是请求 https://10.0.9.171:2380/members 时,10.0.9.171 返回的证书不对。10.0.9.171 是集群中现有的 control-plane,主机名是 kube-master0。10.0.1.81 是以前的 control-plane,主机名是 k8s-master0。

    用 openssl 命令检查证书

    openssl s_client -showcerts -servername 10.0.9.171 -connect 10.0.9.171:2380
    

    的确是证书问题,用的是以前的 k8s-master0 证书

    ---
    Certificate chain
     0 s:CN = k8s-master0
       i:CN = etcd-ca
    -----BEGIN CERTIFICATE-----
    ******
    -----END CERTIFICATE-----
    ---
    Server certificate
    subject=CN = k8s-master0
    
    issuer=CN = etcd-ca
    
    ---
    Acceptable client certificate CA names
    CN = etcd-ca
    

    到 kube-master0 服务上检查 /etc/kubernetes/pki/etcd 中的证书

    openssl x509 -in server.crt -text -noout
    openssl x509 -in peer.crt -text -noout
    

    的确还是以前 k8s-master0 使用的证书。

    知道了问题原因,就很好解决了,重新生成 etcd 用到的证书。

    删除 /etc/kubernetes/pki/etcd 中除了 ca.crt 与 ca.key 之外的证书文件,用下面的命令重新生成证书

    kubeadm init phase certs etcd-server
    kubeadm init phase certs etcd-peer
    kubeadm init phase certs etcd-healthcheck-client
    

    在 kube-master0 上从集群中删除没成功加入集群的 kube-master1

    kubectl delete node kube-master1
    

    在 kube-master1 退出集群并重新加入

    kubeadm reset
    kubeadm join k8s-api:6443 ...
    

    加入成功!问题终于解决!

    [etcd] Announced new etcd member joining to the existing etcd cluster
    [etcd] Creating static Pod manifest for "etcd"
    [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
    The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
    [mark-control-plane] Marking the node kube-master1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
    [mark-control-plane] Marking the node kube-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule node-role.kubernetes.io/control-plane:NoSchedule]
    
    This node has joined the cluster and a new control plane instance was created:
    
  • 相关阅读:
    Matplotlib绘制漫威英雄战力图,带你飞起来!
    jupyter渲染网页的3种方式
    MySQL全文索引、联合索引、like查询、json查询速度大比拼
    进一步聊聊weight initialization
    深度学习基础(2)
    深度学习基础(1)
    SLAM的前世今生
    深度学习:识别图片中的电话号码(1)
    tf更新tensor/自定义层
    tf训练OTSU
  • 原文地址:https://www.cnblogs.com/dudu/p/16294338.html
Copyright © 2020-2023  润新知