1 背景说明
在部署k8s node节点时,kubelet的基础设施镜像,修改为私有仓库的镜像,发现在创建pod的时候,一直报错无法拉取pause的镜像。
2 现象
pod无法启动,一直显示ContainerCreating
[root@node-08 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5bbf8d494b-qf98r 0/1 ContainerCreating 0 94s <none> 172.20.59.57 <none> <none>
通过kubectl describe pod如下报错:
Normal Scheduled <unknown> default-scheduler Successfully assigned default/zmm-nginx-deployment-66548984d9-ghx59 to 172.20.59.57
Warning MissingClusterDNS 8s (x2 over 27s) kubelet, 172.20.59.57 pod: "zmm-nginx-deployment-66548984d9-ghx59_default(3f71451b-9004-43b9-9519-047041bd8c35)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning FailedCreatePodSandBox 2s (x2 over 20s) kubelet, 172.20.59.57 Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "172.20.59.190/kubernetes/pause-amd64:3.1": Error response from daemon: pull access denied for 172.20.59.190/kubernetes/pause-amd64, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
查看kubelet的后台日志,报错如下:
Oct 28 18:17:48 nccztsjb-node-08 kubelet: E1028 18:17:48.761788 15938 pod_workers.go:191] Error syncing pod 18eca05b-803e-413b-ad1a-de948fe212ce ("zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)"), skipping: failed to "CreatePodSandbox" for "zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)" with CreatePodSandboxError: "CreatePodSandbox for pod "zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)" failed: rpc error: code = Unknown desc = failed pulling image "172.20.59.190/kubernetes/pause-amd64:3.1": Error response from daemon: pull access denied for 172.20.59.190/kubernetes/pause-amd64, repository does not exist or may require 'docker login': denied: requested access to the resource is denied"
3 问题分析
通过上面的日志输出来看,是没有权限拉取私有镜像仓库中的pause镜像。
kubelet的启动参数如下
Description=Kubernetes Kubelet Server
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet
--kubeconfig=/root/.kube/config
--hostname-override=172.20.59.57
--pod-infra-container-image=172.20.59.190:81/k8s/pause-amd64:3.1 #该配置为pod的基础容器使用的镜像
--logtostderr=false
--log-dir=/var/log/kubernetes
--v=4
Restart=on-failure
[Install]
WantedBy=multi-user.target
4 尝试的解决方法
4.1 本地docker login登录镜像仓库,通过docker pull拉取该pause镜像,可以解决该问题(需要所有的node都将pause缓存到本地)
4.2 配置docker-registry secret,在pod的yaml文件中,配置imagePullSecrets来使用该secret,创建pod,仍然报这个错误
该方法说明,pod中容器,和kubelet下载的pause使用的是不同的凭证
查找了kubelet配置的所有参数,没有看到和镜像仓库相关的
配置使用secret的方法,可以参考:kuernetes集群中,pod拉取私有镜像仓库(harbor)中的镜像的方法
4.3 修改habor中,pause镜像所在的项目为公开类型--可以解决该问题
思考过程:
- 1.默认的kubelet中pause的配置是gcr.io/google_containers/pause-amd64,也就是表示是从公共仓库中拉取,自然不需要什么权限、认证
- 2.我们配置了私有的pause镜像地址,但是是放在私有权限的仓库里的,拉取的时候需要认证的
- 3.尝试,把harbor里面,pause所在的仓库修改为公开类型,发现,把本地的pause,应用镜像都删除后,重新创建pod,是可以拉取pause镜像成功的,pod启动成功,docker images可以查到
- 4.pause使用公开的仓库去拉取,应用的镜像,通过secret,到私有的仓库拉取