这篇博文记录的是修改 k8s 集群 master(control plane) 的主机名与节点名称的操作步骤,是 用 master 服务器镜像恢复出新集群 的后续博文,目标是将 master 主机名与节点名称由 k8s-master0
修改为 kube-master0
。
服务器操作系统是 Ubuntu 18.04,Kubernetes 版本是 1.20.2。
第1次修改尝试
修改 master 服务器 hostname
hostnamectl set-hostname kube-master0
替换 /etc/kubernetes/manifests 中与主机名相关的配置
oldhost=k8s-master0
newhost=kube-master0
cd /etc/kubernetes/manifests
find . -type f | xargs grep $oldhost
find . -type f | xargs sed -i "s/$oldhost/$newhost/"
find . -type f | xargs grep $newhost
替换 kubeadm-config 中的主机名
kubectl edit cm kubeadm-config -n kube-system
:%s/k8s-master0/kube-master0
重启相关服务是配置修改生效
systemctl daemon-reload && systemctl restart kubelet && systemctl restart docker
进入 etcd 容器确认 member 名称是否已更新
docker exec -it $(docker ps -f name=etcd_etcd -q) /bin/sh
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
896d19d1d0a08f49, started, kube-master0, https://10.0.9.171:2380, https://10.0.9.171:2379, false
查看 node name 是否已经改过来
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master0 NotReady control-plane,master 372d v1.20.2
很遗憾,没改过来。
第2次修改尝试
通过 kubectl edit node k8s-master0
查看节点配置有3个地方还在使用 k8s-master0
- metadata -> labels:
kubernetes.io/hostname: kube-master0
(可以直接修改) - metadata:
name: k8s-master0
(无法修改,报错"error: At least one of apiVersion, kind and name was changed") - status -> addresses:(修改后再次打开又恢复为原值)
- address: k8s-master0
type: Hostname
修改 node 配置文件的方法未成功。
第3次修改尝试
尝试通过 etcdctl 直接修改 etcd 数据库中包含 k8s-master0 的配置数据
设置 etcdctl 的环境变量
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=10.0.9.171:2379
导出所有配置
etcdctl get "" --prefix -w json > etcd-kv.json
基于 etcd-kv.json 导出所有包含 k8s-master0 的配置
for k in $(cat etcd-kv.json | jq '.kvs[].key' | cut -d '"' -f2); do echo $k | base64 --decode; echo; done | grep k8s-master0 > kv_k8s-master0.txt
导出结果如下
/registry/crd.projectcalico.org/blockaffinities/k8s-master0-192-168-70-128-26
/registry/crd.projectcalico.org/ipamhandles/ipip-tunnel-addr-k8s-master0
/registry/csinodes/k8s-master0
/registry/events/default/k8s-master0.165a969b97e7c4ea
...
/registry/events/kube-system/etcd-k8s-master0.165a984e78509ebd
...
/registry/events/kube-system/kube-apiserver-k8s-master0.165a96905a9bf40c
...
/registry/events/kube-system/kube-controller-manager-k8s-master0.165a7016cd8a6ca9
...
/registry/events/kube-system/kube-scheduler-k8s-master0.165a7016cead2a32
...
/registry/leases/kube-node-lease/k8s-master0
/registry/minions/k8s-master0
/registry/pods/kube-system/etcd-k8s-master0
/registry/pods/kube-system/kube-apiserver-k8s-master0
/registry/pods/kube-system/kube-controller-manager-k8s-master0
/registry/pods/kube-system/kube-scheduler-k8s-master0
通过下面的命令添加 /registry/minions/k8s-master0
key=/registry/minions/k8s-master0
etcdctl get $key --print-value-only > kv-temp.txt
sed -i "s/k8s-master0/kube-master0/" kv-temp.txt
cat kv-temp.txt | etcdctl put `echo $key | sed "s/k8s-master0/kube-master0/"`
添加之后运行 kubectl get nodes 报错
Error from server: proto: Unknown: illegal tag 0 (wire type 0)
给 etcdctl 加了 -w fields 参数后消除了上面的报错,但通过 etcdctl 修改的尝试也失败了,详见博问 https://q.cnblogs.com/q/133164/
第4次修改尝试
导出 k8s-master0 的 node 配置文件
kubectl get node k8s-master0 -o yaml > kube-master0.yml
将配置文件中的 k8s-master0 替换为 kube-master0
sed -i "s/k8s-master0/kube-master0/" kube-master0.yml
将宿主机 hostname 修改为 kube-master0
hostnamectl set-hostname kube-master0
替换 /etc/kubernetes/manifests 中与主机名相关的配置
oldhost=k8s-master0
newhost=kube-master0
cd /etc/kubernetes/manifests
find . -type f | xargs sed -i "s/$oldhost/$newhost/"
通过 etcdctl 从 etcd 中删除 /registry/minions/k8s-master0
etcdctl del /registry/minions/k8s-master0
用之前导出并修改的配置文件部署 kube-master0 node
kubectl apply -f kube-master0.yml
这样一番操作后,kubectl get nodes 列表中出现了 kube-master0,但处于 NotReady 状态
NAME STATUS ROLES AGE VERSION
kube-master0 NotReady control-plane,master 97m v1.20.2
syslog 中的错误日志之一
Jan 20 18:20:27 kube-master0 kubelet[23220]: E0120 18:20:27.460470 23220 controller.go:144] failed to ensure lease exists, will retry in 7s, error: leases.coordination.k8s.io "kube-master0" is forbidden: User "system:node:k8s-master0" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease": can only access node lease with the same name as the requesting node
从日志中的 User system:node:k8s-master0"
获知 node 的用户名还没改过来,查看 /etc/kubernetes/kubelet.conf
users:
- name: default-auth
user:
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
用户信息是来自 /var/lib/kubelet/pki/
中的证书文件 kubelet-client-current.pem,用 openssl 命令查看证书绑定的 common name (CN)
$ openssl x509 -noout -subject -in kubelet-client-current.pem
subject=O = system:nodes, CN = system:node:k8s-master0
原来证书还是改名之前的,需要针对新主机名为节点的 kubelet 重新生成证书。
经过一番折腾后,用下面的 kubeadm 命令轻松搞定:
kubeadm init phase kubeconfig kubelet
运行上面的命令重新生成证书后,/etc/kubernetes/kubelet.conf 中 users 部分变成下面的内容:
users:
- name: system:node:kube-master0
user:
client-certificate-data:
***...
client-key-data:
***...
重启 kubelet
systemctl restart kubelet
终于大功告成!
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master0 Ready control-plane,master 18h v1.20.2