"Prevent outdated deployment jobs" does not cancel pending deployment jobs
Summary
When there are multiple pending (automatic -- not manual) deployment jobs, old jobs are not automatically cancelled when "Prevent outdated deployment jobs" is checked in the project's CI/CD settings.
Steps to reproduce
GitLab Version: 15.10.2 Community Edition
GitLab Runner: 15.11.0
This also reproduces on gitlab.com using the public shared runners.
Create a new project with the following .gitlab-ci.yml:
stages:
- build
- deploy
build:
stage: build
script:
- sleep 60
deploy:
stage: deploy
environment:
name: production
action: start
resource_group: production
script:
- sleep 300
This GitLab-CI project does a no-op build, and then a no-op deployment to an environment named "production".
Following the Deployment safety instructions, a resource group of 'production' is used to prevent concurrent deployments.
In the project CI/CD settings, ensure that "Prevent outdated deployment jobs" and "Auto-cancel redundant pipelines" are checked.
Following the Resource Groups instructions, set the process_mode
for the 'production' resource group to 'newest_first':
$ curl --request PUT --data "process_mode=newest_first" --header "PRIVATE-TOKEN: <token>" "https://gitlab.com/api/v4/projects/45538895/resource_groups/production"
{"id":2237700,"key":"production","process_mode":"newest_first","created_at":"2023-04-27T14:36:04.699Z","updated_at":"2023-04-27T14:56:26.773Z"}
To reproduce the issue:
- Push a commit to the project. CI starts running the 'build' stage.
- While the build stage is running, push another commit to the project. This starts another pipeline, and another build stage starts running concurrently.
- Push one more commit to the project while both prior build stages are running. This starts a third pipeline and a third build stage.
- After the first build stage completes, the deployment stage has status 'waiting'. This is because the resource group's
process_mode
was set tonewest_first
. - After the second build stage completes, the deployment stage still has status waiting.
- After the third build stage completes, the deployment stage for this third and most recent pipeline starts running. This is expected again because the
process_mode
was set tonewest_first
. See screenshot. - After the third and most recent pipeline finishes, the next oldest pipeline starts running the deploy stage. This is the bug and should not happen. This deployment is deploying from an out of date pipeline and should have been cancelled according to the documentation and CI settings. See screenshot.
Example Project
See https://gitlab.com/mgulick/citest for an example of this bug.
What is the current bug behavior?
The deployment stage of the outdated pipelines should have been cancelled but instead ran and deployed out of date code.
What is the expected correct behavior?
The outdated deployment stages should have been cancelled.
Relevant logs and/or screenshots
Screenshot showing CI/CD project settings:
Screenshot while most recent pipeline is deploying, showing that old outdated pipelines are in 'waiting' state:
Screenshot after most recent pipeline has finished, showing that old outdated pipelines are running the 'deploy' stage:
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
N/A
Results of GitLab application Check
N/A
Possible fixes
Normally, jobs/processables are enqueued for running through the processable.enqueue
call on the ProcessBuildService. Before enqueuing, there is a check on whether the job has an outdated deployment.
For jobs that are part of a ResourceGroup, they are ultimately enqueued for running through the processable.enqueue_waiting_for_resource
call on the AssignResourceFromResourceGroupService
. Here, there is no check on whether a job has an outdated deployment.
To fix this bug, we need to add an outdated deployment check in the AssignResourceFromResourceGroupService
before processable.enqueue_waiting_for_resource
is called.
diff --git a/app/services/ci/resource_groups/assign_resource_from_resource_group_service.rb b/app/services/ci/resource_groups/assign_resource_from_resource_group_service.rb
index d7078200c145..ec0585617d58 100644
--- a/app/services/ci/resource_groups/assign_resource_from_resource_group_service.rb
+++ b/app/services/ci/resource_groups/assign_resource_from_resource_group_service.rb
@@ -11,7 +11,11 @@ def execute(resource_group)
resource_group.upcoming_processables.take(free_resources).each do |upcoming|
Gitlab::OptimisticLocking.retry_lock(upcoming, name: 'enqueue_waiting_for_resource') do |processable|
- processable.enqueue_waiting_for_resource
+ if processable.outdated_deployment?
+ processable.drop!(:failed_outdated_deployment_job)
+ else
+ processable.enqueue_waiting_for_resource
+ end
end
end
end
For more details on the investigation, see: #408981 (comment 1775876163)