Kubernetes Jobs: A Deeper Look

Updated on November 21, 2023
Kubernetes Jobs: A Deeper Look header image

In Kubernetes, a workload refers to a containerized application, a task, or a service running on Kubernetes. A workload has at least one pod and may contain more of them. To manage these pods, you can use a workload resource. A workload resource configures the Kubernetes controller and manages the pods of your application. The management activities may include:

  • Restarting a failed pod.
  • Maintaining the desired number of instances of an application.
  • Scheduling the creation and deletion of a pod.

The purpose of your application will decide the choice of workload resource. Kubernetes offers many workload resource like Deployment, StatefulSets, DaemonSet, Job, and CronJob. This article explains the Kubernetes Job workload resource.

Prerequisites

To test out the Kubernetes manifest files of this article, you will need:

  • A Kubernetes cluster. You can use Vultr Kubernetes Engine to deploy a Kubernetes cluster.
  • A kubectl client on your local workstation that is configured to work with your Kubernetes cluster.

What Is a Job in Kubernetes

A Job is a workload resource used to run applications that perform a task and then end. Its workload is a process, rather than a service, that is expected to fulfill its task and then end. The workload may have one or more pods, and a Job can run them in parallel if required. A Job ensures that the specified number of pods are executed successfully to complete the task. Other workload resources like Deployment try to maintain the desired state of an application that is required to run for a long time. A Job can be used to perform tasks like:

  1. Backup files
  2. Print messages
  3. Scanning data
  4. Send emails
  5. Compute data

Writing a Manifest File for Job

You can use a manifest file to specify the Job. A Job spec needs the following fields:

  • apiVersion
  • kind
  • metadata
  • spec.template

Look at the following example configuration file.

apiVersion: batch/v1
kind: Job
metadata:
  name: print
spec:
  template:
    spec:
      containers:
      - name: print
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "echo Kubernetes Job"]
      restartPolicy: Never

The apiVersion of a Job is batch/v1. The spec.template field contains an embedded .spec field that defines the spec of a pod. All the pod fields that are applicable for a pod's .spec field can be specified here. To create and execute the above example Job workload, use:

kubectl apply -f example.yaml

Expected output:

job.batch/print created

The Job immediately starts creating pods and runs them.

Inspecting a Job

To check the status of the Job, use:

kubectl describe job print

Sample output:

Name:             print
Namespace:        default
Selector:         controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
Labels:           controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
                  job-name=print
Annotations:      batch.kubernetes.io/job-tracking:
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Fri, 16 Dec 2022 23:40:42 +0530
Completed At:     Fri, 16 Dec 2022 23:41:18 +0530
Duration:         36s
Pods Statuses:    0 Active (0 Ready) / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=5827b820-970d-4ce3-9728-0e5cd8336a26
          job-name=print
  Containers:
  print:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
    Args:
      -c
      echo Kubernetes Job
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  117s  job-controller  Created pod: print-2q9zj
  Normal  Completed         81s   job-controller  Job completed

Check the Events field of the output. It indicates that the Job was successfully created, and 36 seconds later, the Job was completed. To check the name of the pod that was created by the Job controller, use:

kubectl get pods --selector=job-name=print --output=jsonpath='{.items[*].metadata.name}'

Sample output:

print-2q9zj

Now, to check the output of this pod, use:

kubectl logs print-2q9zj

You should see the following output:

Kubernetes Job

This was the task of the Job workload that was created to print the above line.

Types of Job Tasks

A Job can be used with different types of workloads. Here is a short description of each.

Non-parallel Jobs

In Non-parallel Jobs, a single pod can complete the task unless it fails.

  • The Job is completed when the pod exits successfully.
  • The print Job that was mentioned earlier is an example of this category.

Parallel Jobs with a fixed completion count

Use Parallel Jobs with a fixed completion count to create multiple pods of the same spec.

  • The number of pods that are required to complete the Job is specified in the .spec.completions field.
  • By default, the pods are executed one by one. To run a specific number of pods in parallel, specify that number in the .spec.parallelism field.
  • You can assign an index value to pods. To do so, set the .spec.completionMode field as Indexed. All the pods will get an index value from 0 to .spec.completions-1.

Parallel Jobs with a work queue

Use Parallel Jobs with a work queue for multiple related tasks that are performed by different pods.

  • The pods either work with an external source to decide the work or they are designed in a way that helps them decide the work.
  • The pods are run in parallel for efficiency.
  • Do not specify the spec.completions field. Specifying this field causes the Job controller to create all pods with an identical spec.
  • The .spec.parallelism should have a non-negative integer value.
  • Individual pods can determine if other pods of the Job have exited.
  • No new pods are created after the successful termination of one pod.

Understanding Parallelism

The .spec.parallelism field specifies the number of pods that should run in parallel at a given time. It should have a non-negative integer value. If this field is not specified, the default value is set as 1. The actual count of active pods at an instant can differ from the specified value. This can be due to many reasons:

  • The count of remaining completion is less than the parallelism value when a value for the .spec.completions is set. The Job will not create extra pods to maintain the parallelism count.
  • The Job controller may fail to create a Pod. This can happen due to several reasons like lack of permissions, hardware limitations like lack of memory, and so on.
  • In the case of a Job that has a work queue, a successful termination of one pod stops the Job controller from creating new pods. This limits the total number of pods which can be less than the value of the .spec.parallelism field.
  • A pod takes some time to stop, which could make it appear as if the count of active pods is greater than the parallelism count.

If you set a value for the .spec.parallelism field that is greater than 1, your pod should be ready to work with multiple concurrent pods.

Pod Failure in Job

Pods and containers can fail due many reasons. If that happens, the kubelet decides what to do with the pod. The behavior of kubelet depends upon the value of the .spec.template.spec.restartPolicy field. In case of a container failure, if the value is set to:

  • OnFailure, the pod stays on the node, and the kubelet restarts the container again after 10s. If it fails again, the time interval increases to 20s. This time interval increases exponentially (10s, 20s, 40s, ...) and is capped at 5 minutes. This restart timer is reset once the container runs for at least 10 minutes or exits successfully.
  • Never, the Pod is marked as failed.

Note: The restartPolicy applies to the pod and not the Job. If the Job fails, it has to be restarted manually.

Backoff Limit

The .spec.backoffLimit field specifies the number of times the kubelet restarts a failed container before it marks the Job as failed. The backoffLimit is counted in the following ways:

  • Count the number of pods that have .status.phase = "Failed"
  • If the pod has restartPolicy = "OnFailure", count the number of retries of all the containers of a pod that have the .status.phase field set to Pending or Running.

Job Deadline

You can set an active deadline for a Job to complete its execution. The deadline is set using the spec.activeDeadlineSeconds field. Its value is a positive integer that represents the deadline in seconds. This field is compared against the duration of the Job. Once the duration exceeds the value of activeDeadlineSeconds field, the Job terminates.

Note: A Job's activeDeadlineSeconds field has higher precedence than the backoffLimit field. If the duration of the Job exceeds the active deadline value, the Job fails even if there are some retries left before reaching the backoffLimit value.

To try out the activeDeadlineSeconds field of a Job, use the following example configuration file:

apiVersion: batch/v1
kind: Job
metadata:
  name: print-with-deadline
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 15
  template:
    spec:
      containers:
      - name: sleep-print
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "sleep 20 && echo This line will not be printed"]
      restartPolicy: Never

This Job's task is to print a line after waiting for 20 seconds. The activeDeadlineSeconds is set to 15 seconds. Create a Job using the above file using the following command:

kubectl apply -f example.yaml

Expected output:

job.batch/print-with-deadline created

Check the Job status a little after 15 seconds.

kubectl describe job print-with-deadline

Sample output:

Name:                     print-with-deadline
Namespace:                default
Selector:                 controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
Labels:                   controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
                          job-name=print-with-deadline
Annotations:              batch.kubernetes.io/job-tracking:
Parallelism:              1
Completions:              1
Completion Mode:          NonIndexed
Start Time:               Thu, 22 Dec 2022 17:04:30 +0530
Active Deadline Seconds:  15s
Pods Statuses:            0 Active (1 Ready) / 0 Succeeded / 1 Failed
Pod Template:
  Labels:  controller-uid=18fa9bb5-54d0-4788-b2bf-040ea561dfa6
          job-name=print-with-deadline
  Containers:
  sleep-print:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
    Args:
      -c
      sleep 20 && echo This line will not be printed
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type     Reason            Age   From            Message
  ----     ------            ----  ----            -------
  Normal   SuccessfulCreate  17s   job-controller  Created pod: print-with-deadline-84hwz
  Normal   SuccessfulDelete  2s    job-controller  Deleted pod: print-with-deadline-84hwz
  Warning  DeadlineExceeded  2s    job-controller  Job was active longer than specified deadline

As you can see, the Pods Statuses field of the output shows that the pod failed. The latest entry of the Events field has a Warning of DeadlineExceeded. This happened because the pod did not exit successfully before 15 seconds.

Note: The .spec.template.spec field that specifies the pod also has a activeDeadlineSeconds field. Make sure that you specify the field at the correct level.

Deleting a Job After Completion

A Job is not deleted after termination. This allows the user to check its status later on. You can delete a Job named print-with-deadline using:

kubectl delete job print-with-deadline

Expected output:

job.batch "print-with-deadline" deleted

All the pods of the Job get deleted too.

Setup Automatic Deletion of Job

You can set a time to delete a Job after its termination. This feature is controlled by a TTL-after-finished controller. This controller controls the TTL (time to live) of a Job. It limits the time a terminated Job exists for. The value of the .spec.ttlSecondsAfterFinished field specifies this time in seconds. The Job exists for this time frame and gets deleted afterward, along with all the pods that it created.

Example manifest file:

apiVersion: batch/v1
kind: Job
metadata:
  name: auto-delete
spec:
  ttlSecondsAfterFinished: 15
  backoffLimit: 5
  template:
    spec:
      containers:
      - name: print-message
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "echo This job will be deleted automatically after 15 seconds"]
      restartPolicy: Never

Note: If the TTL period has not expired, its value can be updated after creating the Job.

Suspending a Job

When you create a Job, it starts to work right away. The pods are created, containers are run, and all this happens until either the Job terminates successfully or it fails. Kubernetes provides a way to temporarily suspend a running Job and resume it later. You can also create a suspended Job and start it later.

To create a Job in a suspended state, set the .spec.suspend field to true. Look at an example configuration file:

apiVersion: batch/v1
kind: Job
metadata:
  name: suspend-job
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: print-message
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "echo Suspend Job Example"]
      restartPolicy: Never

Create the Job and check its status. The Events field of the output shows:

Events:
  Type    Reason     Age   From            Message
  ----    ------     ----  ----            -------
  Normal  Suspended  12s   job-controller  Job suspended

To resume it later, patch the Job using the following command:

kubectl patch job suspend-job --type=strategic --patch '{"spec":{"suspend":false}}'

Check its status again. The Events field of the output now shows:

Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  Suspended         3m10s  job-controller  Job suspended
  Normal  SuccessfulCreate  6s     job-controller  Created pod: suspend-job-89lhp
  Normal  Resumed           6s     job-controller  Job resumed

In this way, you can toggle the value of the suspend field of a Job.

Using Custom Pod Selector

It is uncommon to specify the .spec.selector field for a Job. When you create a Job, Kubernetes adds this field with a unique value. However, you have the option to specify this field manually.

Note: Make sure you specify a unique value for the .spec.selector field. Otherwise, pods of a different Job may get influenced by the current Job. The pods may get deleted upon the current Job's completion.

Using a custom pod selector can be useful in certain scenarios. For example, if you wish to manage the pods of an existing Job using a new Job template, you can do that using the custom pod selector. To do so, first, find the current value of the pod selector of the Job.

kubectl get job print-job --output=jsonpath='{.spec.template.metadata.labels.controller-uid}'

Sample output:

3954bd70-812c-4aab-86f8-c819895cba93

Now, you need to delete the Job but not its pods.

kubectl delete jobs/old --cascade=orphan

The --cascade=orphan flag specifies that kubectl should not delete the Job's dependent objects, like its pods. Now, create a new Job with the value of the pod selector as a value for the .spec.selector field.

apiVersion: batch/v1
kind: Job
metadata:
  name: custom-job
spec:
  manualSelector: true
  selector:
    matchLabels:
      controller-uid: 3954bd70-812c-4aab-86f8-c819895cba93
  template:
    metadata:
      labels:
        controller-uid: 3954bd70-812c-4aab-86f8-c819895cba93
    spec:
      containers:
      - name: print-message
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "echo Suspend Job Example"]
      restartPolicy: Never

Notice the .spec.manualSelector field that is set to true. This allows you to specify a custom pod selector.

Conclusion

Job is a very useful Kubernetes object that can be used to execute different tasks. You can do a lot of operations on your Kubernetes cluster using Job. This article discussed Kubernetes Job in detail. You can read more about it in the official Kubernetes documentation.