相关软件
1、kubeadm
安装步骤
apt-get update
1、禁用所有交换分区
swapoff -a
/etc/fstab
可以用free命令查看禁用情况
root@gpu-10-0-1-24:~# free total used free shared buff/cache available Mem: 528016312 6131652 343432968 6595072 178451692 512492696 Swap: 0 0 0
2、关闭防火墙
systemctl stop firewalld systemctl disable firewalld
3、禁用SELinux
setenforce 0
安装网络插件flannel
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.0.1.18 --kubernetes-version=v1.11.1 --ignore-preflight-errors=all //--skip-preflight-checks选项已经弃用
报错
[preflight] Activating the kubelet service failure loading ca certificate: couldn't load the private key file /etc/kubernetes/pki/ca.key: open /etc/kubernetes/pki/ca.key: no such file or directory
把自定义pki密钥拷到对应目录下。
sudo: unable to resolve host gpu-10-0-1-18
在/etc/hosts文件中加上主机名映射。
getenforce
添加node节点
kubeadm join 10.0.0.39:6443 --token 4g0p8w.w5p29ukwvitim2ti --discovery-token-ca-cert-hash sha256:21d0adbfcb409dca97e655641573b2ee51c 77a212f194e20a307cb459e5f77c8
kubeadm token list
kubeadm token create --print-join-command
apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl
新加的节点,get nodes的ROLES为<none>
kubectl get pods -n kube-system | grep flannel
kubectl get pods -n kube-system -o wide | grep gpu-10-0-1-24
参考链接
https://tomoyadeng.github.io/blog/2018/10/12/k8s-in-ubuntu18.04/index.html
kubeadm token list empty:
https://www.serverlab.ca/tutorials/containers/kubernetes/how-to-add-workers-to-kubernetes-clusters/
https://stackoverflow.com/questions/51380934/unable-to-connect-worker-node-to-kubernetes-cluster
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
join显示成功,但是get nodes没有:
https://github.com/kubernetes/kubernetes/issues/61224
The connection to the server localhost:8080 was refused - did you specify the right host or port?
https://www.jianshu.com/p/6fa06b9bbf6a
Attempting to reclaim ephemeral-storage
ImagePullBackOff
kubectl -n kube-system logs kube-flannel-ds-jpp96 -c install-cni
node ready并不代表网络插件flannel通了。
flannel也是在镜像中启动的。
k8s可以有多个master节点。
给节点添加role标签
kubectl label node k8s-node1 node-role.kubernetes.io/worker=worker
systemctl restart kubelet会触发联网拉镜像
root@cpu-10-0-3-9:~# ks init xps-kubeflow INFO Using context "kubernetes-admin@kubernetes" from kubeconfig file "/root/.kube/config" INFO Creating environment "default" with namespace "default", pointing to "version:v1.8.0" cluster at address "https://10.0.3.9:6443" INFO Generating ksonnet-lib data at path '/root/xps-kubeflow/lib/ksonnet-lib/v1.8.0'
root@cpu-10-0-3-9:~/xps-k8s# kubectl create -f xps_crd.yaml customresourcedefinition.apiextensions.k8s.io/xps.tencent.com created
kubectl get crd
Pod sandbox changed, it will be killed and re-created.
docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.11
emptydir只在pod范围内共享 所以只要保证一个pod一个容器就行
k8s默认不会调度到master节点上
kubectl taint nodes --all node-role.kubernetes.io/master-
查看所有mxjobs
kubectl get mxjobs.kubeflow.org
分配pod到node:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity