Long running jobs canceled in GitLab UI, but runner continues process
Note (revised 2022-04-28)
If you are still experiencing similar issues as described in this issue, then add a comment with your issue details to the CI process does not receive SIGTERM on termination issue.
Overview
I have a long running compile job (~40 minutes). I made 2 pushes one after another. I stopped the first running job with the ui. It tells me that the job is canceled. But the second job stays at pending.
I suspect that the runner finishes the job and is not properly terminated. Is there a way to test my hypothesis?
I'm using the shell executor for the runner (gitlab and runner are on ubuntu 16.04)
Edit: as written in a comment below steps to reproduce the problem:
create a project with a simple gitlab-ci.yml
file:
build:
stage: build
tags:
- ubuntu_amd64
script:
- ping localhost
start a pipeline and cancel it. This should also terminate the ping command (but it doesn't)
on the runner see if the process is still running
ps aux | grep ping
gitlab-+ 19828 0.0 0.0 8656 1724 ? S 07:59 0:00 ping localhost
or just kill it with killall ping
(use sudo
if the runner is under another user)
Proposal
At the moment we are simply killing the process group with SIGKILL and then ignore the result. Instead of doing this we should allow the process to gracefully shutdown by first sending SIGTERM
and after a specific timeout send SIGKILL
to the process. This will help with the processes being killed properly. We already have this implemented with the custom executor and should try and reuse the code to implement the same feature.
Merge Requests
-
Extract process killer form custom executor -
Extract commander interface from custom executor -
Add Process groups to process
pkg -
Use the same termination commands on Windows - For windows on the shell executor, we pass
taskkil
while in the process package we just callprocess.Kill()
investigate which one is better or if we should use both.
- For windows on the shell executor, we pass
-
Rename test file -
Send SIGTERM
thenSIGKILL
for shell executor