CI or CD process does not receive SIGTERM on termination
Status update of 2024-01-17
Several enhancements were implemented to improve the handling of the SIGTERM
signal in both executordocker (!4446 (merged)) and executorkubernetes (!4443 (merged), !4485 (merged)).
However, no fix has been implemented yet for executorshell.
Summary
When the CI job is cancelled on the Web UI, the job process doesn't receive the SIGTERM
signal. 10 minutes later SIGKILL
is received and the process ends ungracefully.
Steps to reproduce
- Run a CI job.
- Cancel the running CI job.
.gitlab-ci.yml
test:
- echo "Will sleep"
- sleep 5000
Actual behavior
The process running behind the CI job keeps running after the job termination.
Expected behavior
The job is supposed to receive a SIGTERM
signal and terminate.
Relevant logs and/or screenshots
I set up a signal handler with logging in my python script.
Python Code Example
import logging
import signal
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter(LOG_FORMAT)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
handler.setFormatter(formatter)
file_handler = logging.FileHandler('/tmp/bmakan.log')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(formatter)
logger.addHandler(handler)
logger.addHandler(file_handler)
def sigterm_handler(signum, frame):
logger.info(f'parent got shutdown with {signum}')
global shutdown
shutdown = True
exit(1)
signal.signal(signal.SIGHUP, sigterm_handler)
signal.signal(signal.SIGINT, sigterm_handler)
signal.signal(signal.SIGQUIT, sigterm_handler)
signal.signal(signal.SIGILL, sigterm_handler)
signal.signal(signal.SIGTRAP, sigterm_handler)
signal.signal(signal.SIGABRT, sigterm_handler)
signal.signal(signal.SIGBUS, sigterm_handler)
signal.signal(signal.SIGFPE, sigterm_handler)
#signal.signal(signal.SIGKILL, sigterm_handler)
signal.signal(signal.SIGUSR1, sigterm_handler)
signal.signal(signal.SIGSEGV, sigterm_handler)
signal.signal(signal.SIGUSR2, sigterm_handler)
signal.signal(signal.SIGPIPE, sigterm_handler)
signal.signal(signal.SIGALRM, sigterm_handler)
signal.signal(signal.SIGTERM, sigterm_handler)
The logs are written into the stdout (accessingle via the GitLab web page) and into a separate log file.
- When I cancel the job I don't see anything in the logs and the process keeps on running.
- The jobs appears as cancelled on the web page.
- When I run
kill -SIGTERM <pid>
I can see the logged message in my logfile and the process terminates.
Before pressing `Cancel` button
root 13019 1 13019 13019 0 Jan14 ? 00:04:16 /usr/bin/gitlab-runner run --working-directory /home/gitlab-runner --config /etc/gitlab-runner/config.toml --service gitlab-runner --user gitlab-runn
root 29753 13019 29753 13019 0 12:32 ? 00:00:00 su -s /bin/bash gitlab-runner -c bash --login
gitlab-+ 29755 29753 29755 29755 0 12:32 ? 00:00:00 bash --login
gitlab-+ 29798 29755 29755 29755 0 12:32 ? 00:00:00 bash --login
gitlab-+ 29799 29798 29755 29755 0 12:32 ? 00:00:00 python python-script.py
Several seconds, few minutes after hitting the `Cancel` button
gitlab-+ 29798 1 29755 29755 0 12:32 ? 00:00:00 bash --login
gitlab-+ 29799 29798 29755 29755 0 12:32 ? 00:00:00 python python-script.py
After 10 minutes, the SIGKILL
is received at the process terminates ungracefully.
Environment description
- GitLab FOSS: 13.7.4 (2f14978e280)
- Executor: shell
config.toml contents
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "my-runner"
url = "my-url"
token = "my-token"
executor = "shell"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
Used GitLab Runner version
Version: 13.7.0
Git revision: 943fc252
Git branch: 13-7-stable
GO version: go1.13.8
Built: 2020-12-21T13:47:06+0000
OS/Arch: linux/amd64