Do not propagate Build context to k8s executor cleanup method
What does this MR do?
In !4125 (merged), we started propagating the build context in the whole k8s executor
. This came with the side effect that, when the job is cancelled or it times out before the k8s resources cleanup, those resources stay on the cluster and the cleanup fails with the error below:
ERROR: Error cleaning up pod: client rate limiter Wait returned an error: context canceled
To prevent it to happen, a configurable timeout context (default to 5min)
is used for the resources cleanup. This implementation is inspired of what is already implemented for the executordocker
Why was this MR needed?
Make sure resources are actually cleaned up at the end of a job (whether it succeeds or fails)
What's the best way to test this MR?
config.toml
concurrent = 1
check_interval = 1
log_level = "debug"
shutdown_timeout = 0
listen_address = ':9252'
[session_server]
session_timeout = 1800
[[runners]]
name = ""
url = "https://gitlab.com/"
id = 0
token = "__REDACTED__"
token_obtained_at = "0001-01-01T00:00:00Z"
token_expires_at = "0001-01-01T00:00:00Z"
executor = "kubernetes"
shell = "bash"
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "alpine"
namespace = ""
namespace_overwrite_allowed = ""
pod_labels_overwrite_allowed = ""
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
node_selector_overwrite_allowed = ".*"
[runners.kubernetes.volumes]
[[runners.kubernetes.services]]
name = "alpine:latest"
alias = "alpine-service"
command = ["sleep 900s"]
entrypoint = ["/bin/sh", "-c"]
port = 8080
gitlab-ci.yml
job:
timeout: 2m
image: alpine
script:
- sleep 180
Run a job using the config.toml
and the gitlab-ci.yml
provided above.
One the main branch, after the job times out, the pod won't be cleaned up. Using the MR branch, the problem won't happen.
In my test on the main branch, the pod name is runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o
. We can see that it outlives the job timeout and is still running.
❯ kubectl get pod runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o
NAME READY STATUS RESTARTS AGE
runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o 3/3 Running 0 2m17s
❯ kubectl get pod runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o
NAME READY STATUS RESTARTS AGE
runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o 3/3 Running 0 2m20s
❯ kubectl get pod runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o
NAME READY STATUS RESTARTS AGE
runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o 3/3 Running 0 2m23s
❯ kubectl get pod runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o
NAME READY STATUS RESTARTS AGE
runner-dzfsjrxx-project-25452826-concurrent-0-dlbxp15o 3/3 Running 0 2m24s
What are the relevant issue numbers?
Fixes #36803 (closed)