Allow user to specify multiple pull policies for Kubernetes executor
What does this MR do?
The goal of this MR is to retry the creation of a pod any time that pod creation fails due to an image pull error. It will retry with a new pod spec that includes a fallback pull policy for the specific image that failed. It will continue that procedure until the pod setup is successful, or until no more pull policies are available to be tried.
This MR:
- adds support for reading an array of pull policies from the config file (identical to what was done for the Docker executor);
- implements a new
pull
package that includes:-
ImagePullError
which allows the Kubernetes executor to detect an image pull failure, including the image name at the root of the failure; -
Manager
which implements a small state machine to keep track of attempted pull policies per image name and uses that to retry pod set up with the appropriate pull policy per image. It also adds logging to the build log whenever a new pull policy is attempted.
-
- adds an
Unwrap()
method tocommon.BuildError
so that the Kubernetes executor can leverageerrors.As
to fetchImagePullError
fromcommon.BuildError
;
Why was this MR needed?
Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.
Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fallback mechanisms for network connection failures to the target container registry.
Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.
What's the best way to test this MR?
Scenario:
Job link: https://gitlab.com/pedropombeiro/playground/-/jobs/1086674689
NOTE: This example leverages a remote Docker registry at docker-registry.pombei.ro
, replace with your own.
- remote registry does not contain desired image;
- local cache contains desired image;
pull_policy = ["always", "if-not-present"]
The build should fail on the first attempt (always
pull policy) then succeed on the second attempt, with if-not-present
pull policy.
config.toml
[[runners]]
name = "Kubernetes executor"
url = "https://gitlab.com/"
token = "..."
executor = "kubernetes"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "${IMAGE_NAME}:${IMAGE_VERSION}"
pull_policy = ["always", "if-not-present"]
namespace = ""
namespace_overwrite_allowed = ""
privileged = false
helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
[runners.kubernetes.affinity]
[runners.kubernetes.pod_security_context]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
.gitlab-ci.yml
variables:
IMAGE_NAME: docker-registry.pombei.ro/alpine
IMAGE_VERSION: latest
start_evaluation:
script:
- echo Done
tags: [kubernetes]
What are the relevant issue numbers?
Closes #27298 (closed)