Kubernetes Jobs: A Deeper Look
In Kubernetes, a workload refers to a containerized application, a task, or a service running on Kubernetes. A workload has at least one pod and may contain more of them. To manage these pods, you can use a workload resource. A workload resource configures the Kubernetes controller and manages the pods of your application. The management activities may include:
- Restarting a failed pod.
- Maintaining the desired number of instances of an application.
- Scheduling the creation and deletion of a pod.
The purpose of your application will decide the choice of workload resource. Kubernetes offers many workload resource like Deployment, StatefulSets, DaemonSet, Job, and CronJob. This article explains the Kubernetes Job workload resource.
Prerequisites
To test out the Kubernetes manifest files of this article, you will need:
- A Kubernetes cluster. You can use Vultr Kubernetes Engine to deploy a Kubernetes cluster.
- A kubectl client on your local workstation that is configured to work with your Kubernetes cluster.
What Is a Job in Kubernetes
A Job is a workload resource used to run applications that perform a task and then end. Its workload is a process, rather than a service, that is expected to fulfill its task and then end. The workload may have one or more pods, and a Job can run them in parallel if required. A Job ensures that the specified number of pods are executed successfully to complete the task. Other workload resources like Deployment try to maintain the desired state of an application that is required to run for a long time. A Job can be used to perform tasks like:
- Backup files
- Print messages
- Scanning data
- Send emails
- Compute data
Writing a Manifest File for Job
You can use a manifest file to specify the Job. A Job spec needs the following fields:
- apiVersion
- kind
- metadata
- spec.template
Look at the following example configuration file.
apiVersion: batch/v1
kind: Job
metadata:
name: print
spec:
template:
spec:
containers:
- name: print
image: busybox
command: ["/bin/sh"]
args: ["-c", "echo Kubernetes Job"]
restartPolicy: Never
The apiVersion of a Job is batch/v1. The spec.template field contains an embedded .spec field that defines the spec of a pod. All the pod fields that are applicable for a pod's .spec field can be specified here. To create and execute the above example Job workload, use:
kubectl apply -f example.yaml
Expected output:
job.batch/print created
The Job immediately starts creating pods and runs them.
Inspecting a Job
To check the status of the Job, use:
kubectl describe job print
Sample output:
Name: print
Namespace: default
Selector: controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
Labels: controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
job-name=print
Annotations: batch.kubernetes.io/job-tracking:
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Fri, 16 Dec 2022 23:40:42 +0530
Completed At: Fri, 16 Dec 2022 23:41:18 +0530
Duration: 36s
Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
job-name=print
Containers:
print:
Image: busybox
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
echo Kubernetes Job
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 117s job-controller Created pod: print-2q9zj
Normal Completed 81s job-controller Job completed
Check the Events field of the output. It indicates that the Job was successfully created, and 36 seconds later, the Job was completed. To check the name of the pod that was created by the Job controller, use:
kubectl get pods --selector=job-name=print --output=jsonpath='{.items[*].metadata.name}'
Sample output:
print-2q9zj
Now, to check the output of this pod, use:
kubectl logs print-2q9zj
You should see the following output:
Kubernetes Job
This was the task of the Job workload that was created to print the above line.
Types of Job Tasks
A Job can be used with different types of workloads. Here is a short description of each.
Non-parallel Jobs
In Non-parallel Jobs, a single pod can complete the task unless it fails.
- The Job is completed when the pod exits successfully.
- The print Job that was mentioned earlier is an example of this category.
Parallel Jobs with a fixed completion count
Use Parallel Jobs with a fixed completion count to create multiple pods of the same spec.
- The number of pods that are required to complete the Job is specified in the .spec.completions field.
- By default, the pods are executed one by one. To run a specific number of pods in parallel, specify that number in the .spec.parallelism field.
- You can assign an index value to pods. To do so, set the .spec.completionMode field as Indexed. All the pods will get an index value from 0 to .spec.completions-1.
Parallel Jobs with a work queue
Use Parallel Jobs with a work queue for multiple related tasks that are performed by different pods.
- The pods either work with an external source to decide the work or they are designed in a way that helps them decide the work.
- The pods are run in parallel for efficiency.
- Do not specify the spec.completions field. Specifying this field causes the Job controller to create all pods with an identical spec.
- The .spec.parallelism should have a non-negative integer value.
- Individual pods can determine if other pods of the Job have exited.
- No new pods are created after the successful termination of one pod.
Understanding Parallelism
The .spec.parallelism field specifies the number of pods that should run in parallel at a given time. It should have a non-negative integer value. If this field is not specified, the default value is set as 1. The actual count of active pods at an instant can differ from the specified value. This can be due to many reasons:
- The count of remaining completion is less than the parallelism value when a value for the .spec.completions is set. The Job will not create extra pods to maintain the parallelism count.
- The Job controller may fail to create a Pod. This can happen due to several reasons like lack of permissions, hardware limitations like lack of memory, and so on.
- In the case of a Job that has a work queue, a successful termination of one pod stops the Job controller from creating new pods. This limits the total number of pods which can be less than the value of the .spec.parallelism field.
- A pod takes some time to stop, which could make it appear as if the count of active pods is greater than the parallelism count.
If you set a value for the .spec.parallelism field that is greater than 1, your pod should be ready to work with multiple concurrent pods.
Pod Failure in Job
Pods and containers can fail due many reasons. If that happens, the kubelet decides what to do with the pod. The behavior of kubelet depends upon the value of the .spec.template.spec.restartPolicy field. In case of a container failure, if the value is set to:
- OnFailure, the pod stays on the node, and the kubelet restarts the container again after 10s. If it fails again, the time interval increases to 20s. This time interval increases exponentially (10s, 20s, 40s, ...) and is capped at 5 minutes. This restart timer is reset once the container runs for at least 10 minutes or exits successfully.
- Never, the Pod is marked as failed.
Note: The restartPolicy applies to the pod and not the Job. If the Job fails, it has to be restarted manually.
Backoff Limit
The .spec.backoffLimit field specifies the number of times the kubelet restarts a failed container before it marks the Job as failed. The backoffLimit is counted in the following ways:
- Count the number of pods that have .status.phase = "Failed"
- If the pod has restartPolicy = "OnFailure", count the number of retries of all the containers of a pod that have the .status.phase field set to Pending or Running.
Job Deadline
You can set an active deadline for a Job to complete its execution. The deadline is set using the spec.activeDeadlineSeconds field. Its value is a positive integer that represents the deadline in seconds. This field is compared against the duration of the Job. Once the duration exceeds the value of activeDeadlineSeconds field, the Job terminates.
Note: A Job's activeDeadlineSeconds field has higher precedence than the backoffLimit field. If the duration of the Job exceeds the active deadline value, the Job fails even if there are some retries left before reaching the backoffLimit value.
To try out the activeDeadlineSeconds field of a Job, use the following example configuration file:
apiVersion: batch/v1
kind: Job
metadata:
name: print-with-deadline
spec:
backoffLimit: 5
activeDeadlineSeconds: 15
template:
spec:
containers:
- name: sleep-print
image: busybox
command: ["/bin/sh"]
args: ["-c", "sleep 20 && echo This line will not be printed"]
restartPolicy: Never
This Job's task is to print a line after waiting for 20 seconds. The activeDeadlineSeconds is set to 15 seconds. Create a Job using the above file using the following command:
kubectl apply -f example.yaml
Expected output:
job.batch/print-with-deadline created
Check the Job status a little after 15 seconds.
kubectl describe job print-with-deadline
Sample output:
Name: print-with-deadline
Namespace: default
Selector: controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
Labels: controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
job-name=print-with-deadline
Annotations: batch.kubernetes.io/job-tracking:
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Thu, 22 Dec 2022 17:04:30 +0530
Active Deadline Seconds: 15s
Pods Statuses: 0 Active (1 Ready) / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
job-name=print-with-deadline
Containers:
sleep-print:
Image: busybox
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
sleep 20 && echo This line will not be printed
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 17s job-controller Created pod: print-with-deadline-84hwz
Normal SuccessfulDelete 2s job-controller Deleted pod: print-with-deadline-84hwz
Warning DeadlineExceeded 2s job-controller Job was active longer than specified deadline
As you can see, the Pods Statuses field of the output shows that the pod failed. The latest entry of the Events field has a Warning of DeadlineExceeded. This happened because the pod did not exit successfully before 15 seconds.
Note: The .spec.template.spec field that specifies the pod also has a activeDeadlineSeconds field. Make sure that you specify the field at the correct level.
Deleting a Job After Completion
A Job is not deleted after termination. This allows the user to check its status later on. You can delete a Job named print-with-deadline using:
kubectl delete job print-with-deadline
Expected output:
job.batch "print-with-deadline" deleted
All the pods of the Job get deleted too.
Setup Automatic Deletion of Job
You can set a time to delete a Job after its termination. This feature is controlled by a TTL-after-finished controller. This controller controls the TTL (time to live) of a Job. It limits the time a terminated Job exists for. The value of the .spec.ttlSecondsAfterFinished field specifies this time in seconds. The Job exists for this time frame and gets deleted afterward, along with all the pods that it created.
Example manifest file:
apiVersion: batch/v1
kind: Job
metadata:
name: auto-delete
spec:
ttlSecondsAfterFinished: 15
backoffLimit: 5
template:
spec:
containers:
- name: print-message
image: busybox
command: ["/bin/sh"]
args: ["-c", "echo This job will be deleted automatically after 15 seconds"]
restartPolicy: Never
Note: If the TTL period has not expired, its value can be updated after creating the Job.
Suspending a Job
When you create a Job, it starts to work right away. The pods are created, containers are run, and all this happens until either the Job terminates successfully or it fails. Kubernetes provides a way to temporarily suspend a running Job and resume it later. You can also create a suspended Job and start it later.
To create a Job in a suspended state, set the .spec.suspend field to true. Look at an example configuration file:
apiVersion: batch/v1
kind: Job
metadata:
name: suspend-job
spec:
suspend: true
template:
spec:
containers:
- name: print-message
image: busybox
command: ["/bin/sh"]
args: ["-c", "echo Suspend Job Example"]
restartPolicy: Never
Create the Job and check its status. The Events field of the output shows:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Suspended 12s job-controller Job suspended
To resume it later, patch the Job using the following command:
kubectl patch job suspend-job --type=strategic --patch '{"spec":{"suspend":false}}'
Check its status again. The Events field of the output now shows:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Suspended 3m10s job-controller Job suspended
Normal SuccessfulCreate 6s job-controller Created pod: suspend-job-89lhp
Normal Resumed 6s job-controller Job resumed
In this way, you can toggle the value of the suspend field of a Job.
Using Custom Pod Selector
It is uncommon to specify the .spec.selector field for a Job. When you create a Job, Kubernetes adds this field with a unique value. However, you have the option to specify this field manually.
Note: Make sure you specify a unique value for the .spec.selector field. Otherwise, pods of a different Job may get influenced by the current Job. The pods may get deleted upon the current Job's completion.
Using a custom pod selector can be useful in certain scenarios. For example, if you wish to manage the pods of an existing Job using a new Job template, you can do that using the custom pod selector. To do so, first, find the current value of the pod selector of the Job.
kubectl get job print-job --output=jsonpath='{.spec.template.metadata.labels.controller-uid}'
Sample output:
3954bd70-812c-4aab-86f8-c819895cba93
Now, you need to delete the Job but not its pods.
kubectl delete jobs/old --cascade=orphan
The --cascade=orphan
flag specifies that kubectl should not delete the Job's dependent objects, like its pods. Now, create a new Job with the value of the pod selector as a value for the .spec.selector field.
apiVersion: batch/v1
kind: Job
metadata:
name: custom-job
spec:
manualSelector: true
selector:
matchLabels:
controller-uid: 3954bd70-812c-4aab-86f8-c819895cba93
template:
metadata:
labels:
controller-uid: 3954bd70-812c-4aab-86f8-c819895cba93
spec:
containers:
- name: print-message
image: busybox
command: ["/bin/sh"]
args: ["-c", "echo Suspend Job Example"]
restartPolicy: Never
Notice the .spec.manualSelector field that is set to true. This allows you to specify a custom pod selector.
Conclusion
Job is a very useful Kubernetes object that can be used to execute different tasks. You can do a lot of operations on your Kubernetes cluster using Job. This article discussed Kubernetes Job in detail. You can read more about it in the official Kubernetes documentation.