Kubernetes attach strategy hangs when log file is deleted
What does this MR do?
This issue adds the pod deletion when gitlab-runner is no more able to stream the logs from the log file.
Why was this MR needed?
This MR is needed to avoid the use case when the job hangs following the logs deletion leaving the end user without any feedbacks.
What's the best way to test this MR?
The following configurations are needed to test this MR
- Generate the new helper image
eval $(minikube -p minikube docker-env) #if using minikube
make helper-dockerarchive-host
- Push the new image
out/binaries/gitlab-runner-helper/gitlab-runner-helper.x86_64
on your personal docker hub account. The docker hub link to this helper image will be needed in theconfig.toml
.gitlab-ci.yml
hello:
image: alpine
script:
- sleep 5000
config.toml
[[runners]]
name = "kubernetes"
url = "https://gitlab.com/"
token = "YOUR_TOKEN_HERE"
executor = "kubernetes"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.feature_flags]
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY = false
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
namespace = ""
helper_image = "NEW_HELPER_IMAGE"
namespace_overwrite_allowed = ""
privileged = false
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
idle_count = 2
idle_time = 60
[runners.kubernetes.affinity]
[runners.kubernetes.pod_security_context]
[runners.kubernetes.volumes]
As described in the related issue, follow the steps below:
- Run a the above
.gitlab-ci.yml
file. - Retrieve the name of the pod running
POD_RUNNING_JOB
the job with the commandkubectl get pods
. Thepod age
is a good indicator if you have more than one pod running - When the job log start to output the date-time run the follow command to delete the log file
kubectl exec -it -c helper POD_RUNNING_JOB -- sh -c 'rm /logs-PROJECT_ID-JOB_RESPONSE_ID/output.log'
.
Once the log file delete, the job will display an error message (after few second) about the log deletion
WARNING: output log file deleted, cannot continue streaming logs default/runner-lr33aybb-project-24422682-concurrent-0dc2sw/helper:/logs-24422682-1268232072/output.log: command terminated with exit code 100
Cleaning up file based variables
ERROR: Job failed: command terminated with exit code 100
The expected log will log like follow:
The integration test TestLogDeletionFeatureFlag
can also be used to test the addition.
To do so, the
t.Skip("Log deletion test temporary skipped: issue https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27755")
should be commented and the following line added just after the initialization of the build variable
build.Runner.RunnerSettings.Kubernetes.HelperImage = "gitlab/gitlab-runner-helper:XXXX"
XXXX should be replaced by the tag generated by the
make helper-dockerarchive-host
command for the helper image
What are the relevant issue numbers?
closes: #26032 (closed)