Ignore SIGQUIT for duration of artifact upload
What does this MR do?
It fixes, probably in an ugly way, issue "#4239 (closed)", and only the uploading artifacts issue. No care has been taken to see if the core also happens in the upload/download cache parts, or artifact download part.
Why was this MR needed?
Because it's labeled "Won't fix" (actually it labeled "Accepting merge requests" /end passive aggressiveness)
More specifically, I haven't dived into the gitlab-runner's internals, but I believe the signal catching could be done elsewhere once and for all. In the meanwhile, this at least fixes my issue that blocks 80 of my coworkers.
I don't mind keeping a fork of the gitlab-runner if you chose never to merge this, but I'd prefer not to.
How to test
Docker image creation
The docker image helper was generated manually by doing the following, as the CI doesn't push an image by default.
# Get the binary from this MR's artifacts:
wget https://gitlab.com/jlecomte/forks/gitlab-runner/-/jobs/463128801/artifacts/raw/out/helper-images/prebuilt-x86_64.tar.xz?inline=false -O prebuilt-x86_64.tar.xz
tar -xf prebuilt-x86_64.tar.xz usr/bin/gitlab-runner-helper
docker build -t julienlecomte/gitlab-runner-helper:x86_64-latest .
docker push julienlecomte/gitlab-runner-helper:x86_64-latest
Dockerfile:
FROM gitlab/gitlab-runner-helper:x86_64-latest
RUN rm -v /usr/bin/gitlab-runner-helper
COPY usr/bin/gitlab-runner-helper /usr/bin/gitlab-runner-helper
The .gitlab-ci.yml
sigquit-fix:
image: alpine
stage: build
script:
- dd if=/dev/zero of=5g count=$((5*1024)) bs=1048576
- echo sleeping...
- sleep 1m
- echo starting...
artifacts:
paths:
- 5g
The Runner (part 1, the unfixed one)
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "sigquit-fix"
url = "https://gitlab.example.com/"
token = "xyz"
executor = "docker"
environment = ["DOCKER_TLS_CERTDIR="]
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.docker]
tls_verify = false
image = "alpine"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
shm_size = 0
Part 1, The Crash & Core
-
Start the CI pipeline with that job and monitor the output of the job in the GitLab UI.
-
Once you see "sleeping...", as root in the terminal of your runner do:
while true; do pkill -QUIT gitlab-runner ; sleep 1s ; done
-
Wait less than a minute and watch it crash.
Part 2.a
-
Edit the runner /etc/gitlab-runner/config.toml, and add the line
helper_image = "julienlecomte/gitlab-runner-helper:x86_64-latest"
. -
Restart the runner:
gitlab-runner restart
Part 2.b No Crash & No Core
-
Start the CI pipeline with that job and monitor the output of the job in the GitLab UI.
-
Once you see "sleeping...", as root in the terminal of your runner do:
while true; do pkill -QUIT gitlab-runner ; sleep 1s ; done
-
Wait less than a minute and watch it upload a 5G artifact onto your server.
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Added tests for this feature/bug -
In case of conflicts with master
- branch was rebased