Job Controller
Job Controller负责根据Job Spec创建Pod,并持续监控Pod的状态,直至其成功结束。如果失败,则根据restartPolicy(只支持OnFailure和Never,不支持Always)决定是否创建新的Pod再次重试任务。
Job用途
容器按照持续运行的时间可分为两类:服务类容器和工作类容器
服务类容器通常持续提供服务,需要一直运行,比如HTTPServer、Daemon等。工作类容器则是一次性任务,比如批处理程序,完成后容器就退出
Kubernetes的Deployment、ReplicaSet和DaemonSet都用于管理服务类容器;对于工作类容器,我们使用Job
root@ubuntu:~/tenant# cat job.yaml apiVersion: batch/v1 kind: Job metadata: name: myjob spec: template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["echo","hello k8s job !"] restartPolicy: Never
restartPolicy 指定什么情况下需要重启容器。对于Job,只能设置为Never(启动容器失败了,会一直重新启动新的pod)或者OnFailure(启动容器失败,不会重新启动新的pod,节省资源)。对于其他controller(比如Deployment),
root@ubuntu:~/tenant# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 1/1 Running 0 46m 10.244.129.145 centos7 <none> <none> example-foo-54dc4db9fc-lqz9j 1/1 Running 0 19d 10.244.29.26 bogon <none> <none> job-1-nginx-0 0/1 Completed 0 22d 10.244.29.19 bogon <none> <none> myjob-pl75c 0/1 ContainerCreating 0 8s <none> centos7 <none> <none> nginx-ds-f7sjm 1/1 Running 0 10m 10.244.29.23 bogon <none> <none> nginx-ds-ldlrq 1/1 Running 0 10m 10.244.41.1 cloud <none> <none> nginx-ds-p8nqz 1/1 Running 0 10m 10.244.243.195 ubuntu <none> <none> nginx-ds-xrt8b 1/1 Running 0 10m 10.244.129.146 centos7 <none> <none> test-job-default-nginx-0 1/1 Running 0 15d 10.244.29.3 bogon <none> <none> test-job-default-nginx-1 1/1 Running 0 15d 10.244.29.9 bogon <none> <none> test-job-default-nginx-2 1/1 Running 0 15d 10.244.29.19 bogon <none> <none> test-job-default-nginx-3 1/1 Running 0 15d 10.244.29.63 bogon <none> <none> test-job-default-nginx-4 1/1 Running 0 15d 10.244.29.1 bogon <none> <none> test-job-default-nginx-5 1/1 Running 0 15d 10.244.29.2 bogon <none> <none> test-job-v2-default-nginx-v2-0 1/1 Running 0 14d 10.244.29.20 bogon <none> <none> web-0 1/1 Running 0 3h22m 10.244.129.142 centos7 <none> <none> web-1 1/1 Running 0 3h16m 10.244.129.143 centos7 <none> <none> root@ubuntu:~/tenant# kubectl get job NAME COMPLETIONS DURATION AGE myjob 1/1 12s 8m3s root@ubuntu:~/tenant#
root@ubuntu:~/tenant# kubectl get job NAME COMPLETIONS DURATION AGE myjob 1/1 12s 8m3s root@ubuntu:~/tenant# kubectl logs myjob-pl75c hello k8s job ! root@ubuntu:~/tenant#
以上是Pod成功执行的情况,如果Pod失败了会怎么样呢?
修改job.yml,故意引入一个错误
root@ubuntu:~/tenant# vi job.yaml apiVersion: batch/v1 kind: Job metadata: name: myjob spec: template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["invalid cmd","hello k8s job !"] restartPolicy: Never
root@ubuntu:~/tenant# kubectl create -f job.yaml job.batch/myjob created root@ubuntu:~/tenant# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 56m example-foo-54dc4db9fc-lqz9j 1/1 Running 0 19d job-1-nginx-0 0/1 Completed 0 22d myjob-j6mtv 0/1 ContainerCreating 0 9s root@ubuntu:~/tenant# kubectl get job NAME COMPLETIONS DURATION AGE myjob 0/1 23s 23s root@ubuntu:~/tenant# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 57m example-foo-54dc4db9fc-lqz9j 1/1 Running 0 19d job-1-nginx-0 0/1 Completed 0 22d myjob-j6mtv 0/1 ContainerCannotRun 0 27s myjob-zrgmk 0/1 ContainerCannotRun 0 15s root@ubuntu:~/tenant# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 57m example-foo-54dc4db9fc-lqz9j 1/1 Running 0 19d job-1-nginx-0 0/1 Completed 0 22d myjob-j6mtv 0/1 ContainerCannotRun 0 32s myjob-zrgmk 0/1 ContainerCannotRun 0 20s root@ubuntu:~/tenant# kubectl get pod NAME READY STATUS RESTARTS AGE busybox 1/1 Running 0 57m example-foo-54dc4db9fc-lqz9j 1/1 Running 0 19d job-1-nginx-0 0/1 Completed 0 22d myjob-j6mtv 0/1 ContainerCannotRun 0 38s myjob-mdfpz 0/1 ContainerCreating 0 4s myjob-zrgmk 0/1 ContainerCannotRun 0 26s root@ubuntu:~/tenant#
root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-6kfq8 0/1 ContainerCannotRun 0 65s myjob-j6mtv 0/1 ContainerCannotRun 0 119s myjob-mdfpz 0/1 ContainerCannotRun 0 85s myjob-zrgmk 0/1 ContainerCannotRun 0 107s root@ubuntu:~/tenant#
root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-6kfq8 0/1 ContainerCannotRun 0 65s myjob-j6mtv 0/1 ContainerCannotRun 0 119s myjob-mdfpz 0/1 ContainerCannotRun 0 85s myjob-zrgmk 0/1 ContainerCannotRun 0 107s root@ubuntu:~/tenant# kubectl describe pods myjob-mdfpz Name: myjob-mdfpz Namespace: default Priority: 0 Node: centos7/10.10.16.251 Start Time: Thu, 29 Jul 2021 15:29:41 +0800 Labels: controller-uid=2fab27c7-2c65-425b-a698-1a4ffaa24448 job-name=myjob Annotations: cni.projectcalico.org/podIP: cni.projectcalico.org/podIPs: Status: Failed IP: 10.244.129.150 IPs: IP: 10.244.129.150 Controlled By: Job/myjob Containers: hello: Container ID: docker://0b71696f5d71fb7c4ddb7fcb408c2141e890298092ef2701ce695f82d1ff242e Image: busybox Image ID: docker-pullable://docker.io/busybox@sha256:0f354ec1728d9ff32edcd7d1b8bbdfc798277ad36120dc3dc683be44524c8b60 Port: <none> Host Port: <none> Command: invalid cmd hello k8s job ! State: Terminated Reason: ContainerCannotRun Message: oci runtime error: container_linux.go:235: starting container process caused "exec: "invalid cmd": executable file not found in $PATH" Exit Code: 127 Started: Thu, 29 Jul 2021 15:29:49 +0800 Finished: Thu, 29 Jul 2021 15:29:49 +0800 Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-cfr6q (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-cfr6q: Type: Secret (a volume populated by a Secret) SecretName: default-token-cfr6q Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/myjob-mdfpz to centos7 Normal Pulling 2m21s kubelet, centos7 Pulling image "busybox" Normal Pulled 2m17s kubelet, centos7 Successfully pulled image "busybox" Normal Created 2m16s kubelet, centos7 Created container hello Warning Failed 2m15s kubelet, centos7 Error: failed to start container "hello": Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: "invalid cmd": executable file not found in $PATH" root@ubuntu:~/tenant#
下面解释一个现象:为什么kubectl get pod会看到这么多个失败
的Pod?
原因是:当第一个Pod启动时,容器失败退出,根据restartPolicy:
Never,此失败容器不会被重启,但Job DESIRED的Pod是1,目前SUCCESSFUL为0,不满足,所以Job controller会启动新的Pod,直到SUCCESSFUL为1。对于我们这个例子,SUCCESSFUL永远也到不了1,所以Job controller会一直创建新的Pod。为了终止这个行为,只能删除Job
如果将restartPolicy设置为OnFailure会怎么样?下面我们实践一下,修改myjob.yml后重新启动
apiVersion: batch/v1 kind: Job metadata: name: myjob spec: template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["invalid cmd","hello k8s job !"] restartPolicy: OnFailure
root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-f5kvm 0/1 ContainerCreating 0 9s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-f5kvm 0/1 ContainerCreating 0 11s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-f5kvm 0/1 RunContainerError 0 12s
root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-f5kvm 0/1 RunContainerError 2 45s root@ubuntu:~/tenant#
# RESTARTS为2,而且不断增加,说明OnFailure生效,容器失败后会自动重启,不会创建新的pod
Job的并行性
有时我们希望能同时运行多个Pod,提高Job的执行效率。这个可以通过parallelism设置
root@ubuntu:~/tenant# cat job.yaml apiVersion: batch/v1 kind: Job metadata: name: myjob spec: parallelism: 2 ##同时运行两个pod template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["echo","hello k8s job !"] restartPolicy: OnFailure
root@ubuntu:~/tenant# kubectl delete -f job.yaml job.batch "myjob" deleted root@ubuntu:~/tenant# kubectl create -f job.yaml job.batch/myjob created root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-nxw5t 0/1 Completed 0 11s myjob-qsc9r 0/1 ContainerCreating 0 11s root@ubuntu:~/tenant# kubectl get jobs.batch NAME COMPLETIONS DURATION AGE myjob 2/1 of 2 14s 19s root@ubuntu:~/tenant# kubectl get jobs NAME COMPLETIONS DURATION AGE myjob 2/1 of 2 14s 26s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-nxw5t 0/1 Completed 0 30s myjob-qsc9r 0/1 Completed 0 30s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-nxw5t 0/1 Completed 0 36s myjob-qsc9r 0/1 Completed 0 36s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-nxw5t 0/1 Completed 0 43s myjob-qsc9r 0/1 Completed 0 43s root@ubuntu:~/tenant# kubectl logs myjob-nxw5t hello k8s job ! root@ubuntu:~/tenant# kubectl logs myjob-qsc9r hello k8s job ! root@ubuntu:~/tenant#
我们还可以通过completions设置Job成功完成Pod的总数
root@ubuntu:~/tenant# cat job.yaml apiVersion: batch/v1 kind: Job metadata: name: myjob spec: parallelism: 2 ##同时运行两个pod completions: 4 template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["echo","hello k8s job !"] restartPolicy: OnFailure
root@ubuntu:~/tenant# cat job.yaml apiVersion: batch/v1 kind: Job metadata: name: myjob spec: parallelism: 2 ##同时运行两个pod completions: 4 template: metadata: name: myjob spec: containers: - name: hello image: busybox command: ["echo","hello k8s job !"] restartPolicy: OnFailure root@ubuntu:~/tenant# kubectl create -f job.yaml job.batch/myjob created root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-27fss 0/1 Completed 0 19s myjob-dqfgw 0/1 Completed 0 19s myjob-m954l 0/1 ContainerCreating 0 4s myjob-x9bps 0/1 ContainerCreating 0 9s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-27fss 0/1 Completed 0 25s myjob-dqfgw 0/1 Completed 0 25s myjob-m954l 0/1 ContainerCreating 0 10s myjob-x9bps 0/1 Completed 0 15s root@ubuntu:~/tenant# kubectl get pod | grep myjob myjob-27fss 0/1 Completed 0 28s myjob-dqfgw 0/1 Completed 0 28s myjob-m954l 0/1 Completed 0 13s myjob-x9bps 0/1 Completed 0 18s root@ubuntu:~/tenant# kubectl logs myjob-m954l hello k8s job ! root@ubuntu:~/tenant# kubectl logs myjob-dqfgw hello k8s job ! root@ubuntu:~/tenant#