Custom executor does not respect actual exit code BuildFailureExitCode
Summary
We are running a custom executor written in python to use Openstack Instances (VMs) as build environment. I tried all sorts of settings/features/hacks, but it looks like the custom executor does not compare the actual exit code of a job with BuildFailureExitCode.
The following line of code should compare the exit code of the build job with the variable BuildFailureExitCode. https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/executors/custom/command/command.go?ref_type=heads#L104
But it looks like the value of tihs variable is "hardcoded"? https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/executors/custom/command/command.go?ref_type=heads#L17
This is how party of the python executor code looks like:
def main() -> None:
conn = openstack.connect()
ip = get_server_ip(conn)
ssh_client = get_ssh_client(ip)
exit_status = execute_script_on_server(ssh_client, sys.argv[1])
ssh_client.close()
if exit_status != 0:
os.environ['BUILD_FAILURE_EXIT_CODE'] = str(exit_status)
print('BUILD_FAILURE_EXIT_CODE is:', os.environ['BUILD_FAILURE_EXIT_CODE'])
print('SYSTEM_FAILURE_EXIT_CODE is:', os.environ['SYSTEM_FAILURE_EXIT_CODE'])
sys.exit(int(os.environ['BUILD_FAILURE_EXIT_CODE']))
relevant part of my pipeline as follows:
script:
- set +e
- ./buildit.sh || EXIT_CODE=$?
- export BUILD_FAILURE_EXIT_CODE=$EXIT_CODE
- echo echoing exit $BUILD_FAILURE_EXIT_CODE
- exit $BUILD_FAILURE_EXIT_CODE
allow_failure:
exit_codes: 99
artifacts:
reports:
dotenv: build.env
And the output of a failed job looks like:
$ export BUILD_FAILURE_EXIT_CODE=$EXIT_CODE
$ echo echoing exit $BUILD_FAILURE_EXIT_CODE
echoing exit 99
$ exit $BUILD_FAILURE_EXIT_CODE
BUILD_FAILURE_EXIT_CODE is: 99
SYSTEM_FAILURE_EXIT_CODE is: 0
Uploading artifacts for failed job
00:02
Uploading artifacts...
Cleaning up project directory and file based variables
00:03
ERROR: Job failed (system failure): unknown Custom executor executable exit code 99; executable execution terminated with: exit status 99
So my goal was to exit the bash script buildit.sh with code 99 and allow this failure so the pipeline continues. But the pipeline fails cause of the ERROR shown above.
I tested with
variables:
FF_USE_NEW_BASH_EVAL_STRATEGY: 1
FF_ENABLE_BASH_EXIT_CODE_CHECK: 1
and without. I tested all kinds of bash stuff, but nothing solved the issue.
Steps to reproduce
.gitlab-ci.yml
variables:
FF_USE_NEW_BASH_EVAL_STRATEGY: 1
FF_ENABLE_BASH_EXIT_CODE_CHECK: 1
stages:
- build
- test
- glance
build_and_upload:
variables:
FLAVOR: "e1c.2xlarge"
BUILDER_IMAGE: "Ubuntu 24.04 DevTools"
tags:
- vm
stage: build
before_script:
- eval $(ssh-agent -s)
- echo "$KOLLA_BUILD_RSA_KEY" | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -t rsa gxxxxxxxxx >> ~/.ssh/known_hosts
- echo "21xxxxxxxxxxxxx.ch" | sudo tee -a /etc/hosts
script:
- set +e
- ./buildit.sh || EXIT_CODE=$?
- export BUILD_FAILURE_EXIT_CODE=$EXIT_CODE
- echo echoing exit $BUILD_FAILURE_EXIT_CODE
- exit $BUILD_FAILURE_EXIT_CODE
allow_failure:
exit_codes: 99
artifacts:
reports:
dotenv: build.env
build_nvidia:
variables:
FLAVOR: "e1c.2xlarge"
BUILDER_IMAGE: "Ubuntu 24.04 DevTools"
tags:
- vm
stage: build
before_script:
- eval $(ssh-agent -s)
- echo "$KOLLA_BUILD_RSA_KEY" | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -t rsa gxxxxxxxx.ch >> ~/.ssh/known_hosts
- echo "21xxxxxxxxxx.ch" | sudo tee -a /etc/hosts
script:
- set +e
- ./nvidia.sh || EXIT_CODE=$?
- exit $EXIT_CODE
allow_failure:
exit_codes: 96
needs: [build_and_upload]
download_and_test:
variables:
FLAVOR: "e1c.2xlarge"
BUILDER_IMAGE: "Ubuntu 24.04 DevTools"
tags:
- vm
stage: test
before_script:
- eval $(ssh-agent -s)
- echo "$KOLLA_BUILD_RSA_KEY" | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -t rsa gitxxxxxxxxxh >> ~/.ssh/known_hosts
- echo "21xxxxxx16 $DOCKER_REGISTRY gixxxxxxch" | sudo tee -a /etc/hosts
script:
- set +e
- ./test.sh || EXIT_CODE=$?
- exit $EXIT_CODE
allow_failure:
exit_codes: 98
needs: [build_and_upload]
glance_prod:
tags:
- kolla
- prod
stage: glance
script:
- set +e
- ./glance.sh || EXIT_CODE=$?
- exit $EXIT_CODE
allow_failure:
exit_codes: 97
needs: [download_and_test]
glance_stage:
tags:
- kolla
- stage
stage: glance
script:
- set +e
- ./glance.sh || EXIT_CODE=$?
- exit $EXIT_CODE
allow_failure:
exit_codes: 97
needs: [download_and_test]
Actual behavior
Jobs fails and whole pipeline is ended. All following stages are skipped.
Expected behavior
Jobs fails with a "!" but pipeline continues due to
allow_failure:
exit_codes: 99
Relevant logs and/or screenshots
job log
Running with gitlab-runner 17.2.1 (9882d9c7)
on os-gitlab-runner-os1 xPmy-3-U, system ID: r_Z5XMaeNcQKDk
feature flags: FF_ENABLE_BASH_EXIT_CODE_CHECK:true, FF_USE_NEW_BASH_EVAL_STRATEGY:true
Preparing the "custom" executor
01:33
Using Custom executor with driver Openstack 2024.07.30.3...
SYSTEM_FAILURE_EXIT_CODE is at the beginning of prepare: 0
Connecting to Openstack
Provisioning an instance gitlab-builder-232-project-237-concurrent-0-job-68645
Instance gitlab-builder-232-project-237-concurrent-0-job-68645 is running on address 10.99.99.146
Checking SSH connection
SSH connection has been established
gitlab-runner binary copied
2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.ch
HOME is /home/ubuntu
copy gitlab-runner binary to /usr/bin
-rwxr-xr-x 1 root root 68712254 Jul 30 14:01 /usr/bin/gitlab-runner
done
# Ubuntu sources have moved to the /etc/apt/sources.list.d/ubuntu.sources
# file, which uses the deb822 format. Use deb822-formatted .sources files
# to manage package sources in the /etc/apt/sources.list.d/ directory.
# See the sources.list(5) manual page for details.
/etc/apt/sources.list.d
/etc/apt/sources.list.d/ubuntu.sources
/etc/apt/sources.list.d/ubuntu.sources.curtin.orig
64 packages can be upgraded. Run 'apt list --upgradable' to see them.
APT updated
Installing git and git-lfs
Reading package lists...
Building dependency tree...
Reading state information...
git is already the newest version (1:2.43.0-1ubuntu7.1).
curl is already the newest version (8.5.0-2ubuntu10.1).
ca-certificates is already the newest version (20240203).
ca-certificates set to manually installed.
The following packages will be upgraded:
git-lfs wget
2 upgraded, 0 newly installed, 0 to remove and 62 not upgraded.
Need to get 4239 kB of archives.
After this operation, 245 kB of additional disk space will be used.
Get:1 http://ch-zh1-az2.clouds.archive.ubuntu.com/ubuntu noble-updates/main amd64 wget amd64 1.21.4-1ubuntu4.1 [334 kB]
Get:2 http://ch-zh1-az2.clouds.archive.ubuntu.com/ubuntu noble-updates/universe amd64 git-lfs amd64 3.4.1-1ubuntu0.1 [3906 kB]
Fetched 4239 kB in 0s (11.4 MB/s)
(Reading database ... 100344 files and directories currently installed.)
Preparing to unpack .../wget_1.21.4-1ubuntu4.1_amd64.deb ...
Unpacking wget (1.21.4-1ubuntu4.1) over (1.21.4-1ubuntu4) ...
Preparing to unpack .../git-lfs_3.4.1-1ubuntu0.1_amd64.deb ...
Unpacking git-lfs (3.4.1-1ubuntu0.1) over (3.4.1-1) ...
Setting up wget (1.21.4-1ubuntu4.1) ...
Setting up git-lfs (3.4.1-1ubuntu0.1) ...
Processing triggers for install-info (7.1-3build2) ...
Processing triggers for man-db (2.12.0-4build2) ...
Running kernel seems to be up-to-date.
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.
No VM guests are running outdated hypervisor (qemu) binaries on this host.
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
userdata.sh installed
SYSTEM_FAILURE_EXIT_CODE is at the end of prepare: 0
Preparing environment
00:02
Running on gitlab-builder-232-project-237-concurrent-0-job-68645...
Getting source from Git repository
00:03
Fetching changes with git depth set to 20...
Initialized empty Git repository in /home/ubuntu/builds/openstack/gardenlinux-ew-build/.git/
Created fresh repository.
Checking out 071517ba as detached HEAD (ref is main)...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:05
WARNING: Starting with version 17.0 the 'build_script' stage will be replaced with 'step_script': https://gitlab.com/groups/gitlab-org/-/epics/6112
$ eval $(ssh-agent -s)
Agent pid 3004
$ echo "$KOLLA_BUILD_RSA_KEY" | ssh-add -
$ mkdir -p ~/.ssh
$ chmod 700 ~/.ssh
$ ssh-keyscan -t rsa gitxxxxxxxx.ch >> ~/.ssh/known_hosts
$ echo "21xxxxxxxxxx16 $DOCKER_REGISTRY git.xxxxxxxx.ch" | sudo tee -a /etc/hosts
2xxxxxxxxxxxxxxxxxxxxxxxxxs.ch
$ set +e
$ ./buildit.sh || EXIT_CODE=$?
release branch is rel-1443
Latest gardenlinux release is 1443.10
Commit ID is: 8d098305ade0addbe03be324cc50b1bf4bf3b206
print content of LATEST
LATEST=1443.10
LATESTUUID=14430010-2024-0723-1730-c2f57e0c9732
LATEST_BUILDCOMMIT=c2f57e0c9732f871c129d71b58ca24247cad612d
LATEST_BUILDDATE=2024-0723-1730
LATEST_BUILDID=68348
source LATEST
Latest build is 1443.10
Latest release is 1443.10
Latest gardenlinux release is 1443.10 and matches our latest build. Quitting...
$ export BUILD_FAILURE_EXIT_CODE=$EXIT_CODE
$ echo echoing exit $BUILD_FAILURE_EXIT_CODE
echoing exit 99
$ exit $BUILD_FAILURE_EXIT_CODE
Identity added: (stdin) ((stdin))
# git.i.exxxxxxxs.ch:22 SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.7
Cloning into '/home/ubuntu/builds/gardenlinux'...
Note: switching to 'tags/1443.10'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 8d098305 prepare 1443.10
BUILD_FAILURE_EXIT_CODE is: 99
SYSTEM_FAILURE_EXIT_CODE is: 0
Uploading artifacts for failed job
00:02
Uploading artifacts...
Cleaning up project directory and file based variables
00:03
ERROR: Job failed (system failure): unknown Custom executor executable exit code 99; executable execution terminated with: exit status 99
Environment description
The runner is running on a Openstack VM. The runner is using the openstack client to spawn a fresh VM and uses ssh (python module paramiko) to start the build job inside of this fresh VM.
config.toml contents
concurrent = 12
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "os-gitlab-runner-os1"
output_limit = 20480
url = "https://gixxxxxxxxch"
id = 232
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxuN"
token_obtained_at = 2024-07-30T10:53:04Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "custom"
builds_dir = "builds"
cache_dir = "cache"
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Shared = true
MaxUploadedArchiveSize = 0
[runners.cache.s3]
ServerAddress = "s3.exxxxxxxx.ch"
AccessKey = "exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx15"
SecretKey = "axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxb"
BucketName = "openstack-gitlab-runner-cache-1"
[runners.cache.gcs]
[runners.cache.azure]
[runners.custom]
config_exec = "/home/gitlab-runner/config.sh"
prepare_exec = "/home/gitlab-runner/prepare.py"
run_exec = "/home/gitlab-runner/run.py"
cleanup_exec = "/home/gitlab-runner/cleanup.py"
Used GitLab Runner version
Running with gitlab-runner 17.2.1 (9882d9c7)
on os-gitlab-runner-os1 xPmy-3-U, system ID: r_Z5XMaeNcQKDk
feature flags: FF_ENABLE_BASH_EXIT_CODE_CHECK:true, FF_USE_NEW_BASH_EVAL_STRATEGY:true
Possible fixes
no idea how to fix it