Allow user to specify multiple pull policies for Kubernetes executor
Problem to solve
Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.
Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fall back mechanisms for network connection failures to the target container registry.
Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.
Proposal
Instead of us creating a new pull policy, we allow users to define multiple pull policies. For example, the user can define pull_policy = ["always", "if-not-present"]
inside of their config.toml
. It will first use the always
pull policy, if that fails it will use the next one in line which is if-not-present
. This will achieve the always-or-fallback
pull policy without introducing it. A small PoC of this was achieved in !2587 (closed)
So for example imagine I have the following config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "steve-mbp-gitlab.local"
url = "https://gitlab.com/"
token = "xxxxxx"
executor = "kubernetes"
[runners.kubernetes]
image = "localonly/alpine:3.12"
pull_policy = ["always", "if-not-present"] # Multiple pull policies specified, we'll go one by one if it fails. In this case, first it will try and pull the image, then use the local image if it's present
We can it working like below
Specification
- Allow
pull_policy
for the executorkubernetes to be either a stringpull_policy = "always"
or a slice of stringspull_policy = ["always", "if-not-present"]
- Start with the first pull policy (left to right) if any error is presented, even a
403
(because it might be a production issue) fallback to the next pull policy. For example, if we havepull_policy = ["always", "if-not-present"]
we will usealways
and then if it errors we will useif-not-present
. We need to check for the error why a pod creation failed and see if it's because of pulling images. - Show a warning level log that the first pull policy failed.
- Show an info level log that we are changing the pull policy.