Allow user to specify multiple pull policies for Kubernetes executor (!2807) · Merge requests · GitLab.org / gitlab-runner

Pedro Pombeiro requested to merge pedropombeiro/27298/allow-multiple-k8s-pull-policies into master Mar 09, 2021

What does this MR do?

The goal of this MR is to retry the creation of a pod any time that pod creation fails due to an image pull error. It will retry with a new pod spec that includes a fallback pull policy for the specific image that failed. It will continue that procedure until the pod setup is successful, or until no more pull policies are available to be tried.

This MR:

adds support for reading an array of pull policies from the config file (identical to what was done for the Docker executor);
implements a new pull package that includes:
- ImagePullError which allows the Kubernetes executor to detect an image pull failure, including the image name at the root of the failure;
- Manager which implements a small state machine to keep track of attempted pull policies per image name and uses that to retry pod set up with the appropriate pull policy per image. It also adds logging to the build log whenever a new pull policy is attempted.
adds an Unwrap() method to common.BuildError so that the Kubernetes executor can leverage errors.As to fetch ImagePullError from common.BuildError;

Why was this MR needed?

Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.

Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fallback mechanisms for network connection failures to the target container registry.

Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.

What's the best way to test this MR?

Scenario:

Job link: https://gitlab.com/pedropombeiro/playground/-/jobs/1086674689

NOTE: This example leverages a remote Docker registry at docker-registry.pombei.ro, replace with your own.

remote registry does not contain desired image;
local cache contains desired image;
pull_policy = ["always", "if-not-present"]

The build should fail on the first attempt (always pull policy) then succeed on the second attempt, with if-not-present pull policy.

config.toml

[[runners]]
  name = "Kubernetes executor"
  url = "https://gitlab.com/"
  token = "..."
  executor = "kubernetes"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "${IMAGE_NAME}:${IMAGE_VERSION}"
    pull_policy = ["always", "if-not-present"]
    namespace = ""
    namespace_overwrite_allowed = ""
    privileged = false
    helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    [runners.kubernetes.affinity]
    [runners.kubernetes.pod_security_context]
    [runners.kubernetes.volumes]
    [runners.kubernetes.dns_config]

.gitlab-ci.yml

variables:
  IMAGE_NAME: docker-registry.pombei.ro/alpine
  IMAGE_VERSION: latest

start_evaluation:
  script:
    - echo Done
  tags: [kubernetes]

What are the relevant issue numbers?

Closes #27298 (closed)

Edited Mar 10, 2021 by Pedro Pombeiro

Allow user to specify multiple pull policies for Kubernetes executor

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

Scenario:

What are the relevant issue numbers?

Merge request reports