一.系统环境
服务器版本 | docker软件版本 | CPU架构 |
---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | x86_64 |
二.前言
当我们安装部署好一套Kubernetes集群,使用一段时间之后可能会有重新安装Kubernetes集群的需求,本文为了满足这个需求,模拟重装Kubernetes集群。
重新安装Kubernetes集群的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》https://www.cnblogs.com/renshengdezheli/p/16686769.html
三.重装Kubernetes集群
3.1 环境介绍
Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
服务器 | 操作系统版本 | CPU架构 | 进程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master节点 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
3.2 删除k8s所有节点(node)
kubectl drain 安全驱逐节点上面所有的 pod,--ignore-daemonsets往往需要指定的,这是因为deamonset会忽略SchedulingDisabled标签(使用kubectl drain时会自动给节点打上不可调度SchedulingDisabled标签),因此deamonset控制器控制的pod被删除后,可能马上又在此节点上启动起来,这样就会成为死循环.因此这里忽略daemonset.
[root@k8scloude1 ~]# kubectl drain k8scloude3 --ignore-daemonsets
node/k8scloude3 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wmz4r, kube-system/kube-proxy-84gcx
evicting pod kube-system/calico-kube-controllers-6b9fbfff44-rl2mh
pod/calico-kube-controllers-6b9fbfff44-rl2mh evicted
node/k8scloude3 evicted
k8scloude3变为SchedulingDisabled
[root@k8scloude1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 64m v1.21.0
k8scloude2 Ready <none> 56m v1.21.0
k8scloude3 Ready,SchedulingDisabled <none> 56m v1.21.0
删除节点k8scloude3
[root@k8scloude1 ~]# kubectl delete nodes k8scloude3
node "k8scloude3" deleted
[root@k8scloude1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 65m v1.21.0
k8scloude2 Ready <none> 57m v1.21.0
其余节点进行类似操作
[root@k8scloude1 ~]# kubectl drain k8scloude2 --ignore-daemonsets
node/k8scloude2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-bbst4, kube-system/kube-proxy-8wf8t
evicting pod kube-system/coredns-545d6fc579-kgmfl
evicting pod kube-system/calico-kube-controllers-6b9fbfff44-nq79f
evicting pod kube-system/coredns-545d6fc579-dln6p
pod/coredns-545d6fc579-dln6p evicted
pod/coredns-545d6fc579-kgmfl evicted
pod/calico-kube-controllers-6b9fbfff44-nq79f evicted
node/k8scloude2 evicted
[root@k8scloude1 ~]# kubectl drain k8scloude1 --ignore-daemonsets
node/k8scloude1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-r57vx, kube-system/kube-proxy-zblkg
evicting pod kube-system/coredns-545d6fc579-tgcl4
evicting pod kube-system/calico-kube-controllers-6b9fbfff44-t9k45
evicting pod kube-system/coredns-545d6fc579-l9g7b
pod/calico-kube-controllers-6b9fbfff44-t9k45 evicted
pod/coredns-545d6fc579-tgcl4 evicted
pod/coredns-545d6fc579-l9g7b evicted
node/k8scloude1 evicted
[root@k8scloude1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready,SchedulingDisabled control-plane,master 66m v1.21.0
k8scloude2 Ready,SchedulingDisabled <none> 58m v1.21.0
[root@k8scloude1 ~]# kubectl delete nodes k8scloude2
node "k8scloude2" deleted
[root@k8scloude1 ~]# kubectl delete nodes k8scloude1
node "k8scloude1" deleted
此时,k8s集群所有节点都被删除了
[root@k8scloude1 ~]# kubectl get nodes
No resources found
3.3 kubeadm初始化
此时重新进行kubeadm初始化,但是报错,看报错信息可以发现:端口被占用,配置文件已经存在
[root@k8scloude1 ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.0 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.21.0
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
当我们重新初始化k8s集群的时候,需要清空原先的设置
[root@k8scloude1 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0109 16:17:15.936292 53177 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "k8scloude1" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0109 16:17:17.651795 53177 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
重新进行kubeadm初始化
[root@k8scloude1 ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.0 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.21.0
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8scloude1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.110.130]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8scloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8scloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 64.004984 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8scloude1 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8scloude1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 45wtx2.gfb3j9obk0fz663z
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z \
--discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
按照要求创建目录和配置文件
[root@k8scloude1 ~]# mkdir -p $HOME/.kube
[root@k8scloude1 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y
[root@k8scloude1 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
3.4 添加worker节点到k8s集群
接下来把另外的两个worker节点也加入到k8s集群。
把k8scloude2节点加入k8s集群
#另外两个节点执行加入集群的命令
[root@k8scloude2 ~]# kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z --discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
work节点重新加入k8s集群也需要清空原先的设置
[root@k8scloude2 ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0109 16:22:12.705575 59352 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
再次把k8scloude2节点加入k8s集群,可以看到k8scloude2节点加入k8s集群成功
[root@k8scloude2 ~]# kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z --discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
k8scloude3节点也进行类似操作
[root@k8scloude3 ~]# kubeadm reset
[root@k8scloude3 ~]# kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z --discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
查看k8s集群节点状态
#此时所有节点都显示Ready状态
[root@k8scloude1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 5m v1.21.0
k8scloude2 Ready <none> 63s v1.21.0
k8scloude3 Ready <none> 33s v1.21.0
3.5 安装calico
因为我们之前已经安装过一次k8s集群了,并且calico插件也安装好了,重装之后calico是没有装的,但是kubectl get nodes的状态都为Ready状态,是因为Ready这个状态已经写入了etcd数据库里了,状态没更新,所以需要重装一次calico
[root@k8scloude1 ~]# kubectl apply -f calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
Warning: policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
poddisruptionbudget.policy/calico-kube-controllers created
现在集群才是完全正常的
[root@k8scloude1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 9m11s v1.21.0
k8scloude2 Ready <none> 5m14s v1.21.0
k8scloude3 Ready <none> 4m44s v1.21.0
注意:如果k8s master节点没有执行kubeadm reset重置命令,只是重置了worker节点,则不需要重新安装calico
[root@k8scloude1 ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-6b9fbfff44-4jzkj 1/1 Running 0 3m16s 10.244.251.193 k8scloude3 <none> <none>
calico-node-bdlgm 1/1 Running 0 3m16s 192.168.110.130 k8scloude1 <none> <none>
calico-node-hx8bk 1/1 Running 0 3m16s 192.168.110.128 k8scloude3 <none> <none>
calico-node-nsbfs 1/1 Running 0 3m16s 192.168.110.129 k8scloude2 <none> <none>
coredns-545d6fc579-7wm95 1/1 Running 0 11m 10.244.158.65 k8scloude1 <none> <none>
coredns-545d6fc579-87q8j 1/1 Running 0 11m 10.244.158.66 k8scloude1 <none> <none>
etcd-k8scloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none>
kube-apiserver-k8scloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none>
kube-controller-manager-k8scloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none>
kube-proxy-599xh 1/1 Running 0 7m48s 192.168.110.128 k8scloude3 <none> <none>
kube-proxy-lpj8z 1/1 Running 0 8m18s 192.168.110.129 k8scloude2 <none> <none>
kube-proxy-zxlk9 1/1 Running 0 11m 192.168.110.130 k8scloude1 <none> <none>
kube-scheduler-k8scloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none>
自此,k8s集群重装完成!