Backend: Jobs that run on_failure are sometimes unexpectedly skipped when they also have optional needs
Summary
When a job has when: on_failure
, it should run when at least one other job in the same pipeline fails. When the job that has when: on_failure
also has needs
, the job is unexpectedly skipped when other jobs in the same pipeline fail.
If the needs
are removed: the when: on_failure
job works properly: it runs when other jobs in the same pipeline fail.
Steps to reproduce
- Use a
.gitlab-ci.yml
file like the one shown below - Observe that the
build
job fails - Observe that the
rollback
job is skipped (Therollback
job should run becausebuild
failed.)
build_job: stage: build script: - exit 1 test_job: stage: test script: - date rollback_job: stage: deploy needs: - job: test_job optional: true - job: build_job optional: true script: - date when: on_failure
Proposal
The reason is that we are skipping the job if it is a DAG job and needs any skipped
or ignored
job; The below condition should be modified to accommodate this scenario for when it occurs.
if @dag && any_skipped_or_ignored?
# The DAG job is skipped if one of the needs does not run at all.
'skipped'
Example Project
This unexpected behavior can be observed in the
What is the current bug behavior?
A job with when: on_failure
is skipped when it contains needs
and at least one job in the pipeline has failed.
What is the expected correct behavior?
A job with when: on_failure
and needs
should run when at least one other job in in the pipeline has failed.
The screenshot above shows what things should look like. Removing the needs
altogether permits things to look like the screenshot above.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
Possible Workarounds
- Remove the
needs
from therollback
job completely- This may not be feasible for some environments.
- Possibly: move the jobs that may fail to the another stage in the pipeline
- I wrote "sometimes" in the issue title because there is one specific set of circumstances I have identified thus far where the presence of
when: on_failure
and optionalneeds
do work as expected. See this example pipeline.
- I wrote "sometimes" in the issue title because there is one specific set of circumstances I have identified thus far where the presence of
A few more thoughts on this:
Observe that the optional needs
job fails.
The documentation on needs:optional
notes:
To need a job that sometimes does not exist in the pipeline, add
optional: true
to theneeds
configuration.
That sounds like it's about the absence or presence of the needed job and not about the success or failure of the job.
-
Is the thought above right?
It is not possible to use allow_failure
to work around this because we also note in the docs:
- If
allow_failure: true
is set, the job is always considered successful, and later jobs withwhen: on_failure
don’t start if this job fails.e