Abort pipelines when user is blocked [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
In https://gitlab.com/gitlab-org/gitlab/-/issues/323742 we identified a problem where upon blocking a user, if that user has many pipelines and builds running, then gracefully cancelling those via Ci::CancelUserPipelinesService
leads to N+1s and very slow execution.
We think it is best in this case to cancel those in bulk instead, as we did with deleted projects already, which can be done more efficiently. This has the drawback of not triggering AR callbacks. This MR addresses the resulting problems only up to those changes affecting the UI so that pipelines views to not appear stuck. All other domain integrity related issues are filed in this follow-up issue: #324309
Screenshots
After blocking a user:
The review app is deployed here: https://gitlab-review-323742-abo-fiu1qi.gitlab-review.app
Implementation
I decided to slightly generalize the existing service into AbortPipelinesService
instead of creating a new service, since really there was no reason for this to only act on project pipelines. It now simply acts on pipelines and the related builds, and the caller can decide how those are scoped (user, project, or otherwise.)
New queries
This MR introduces two new queries and changes one:
- New: batch-select + update of user pipelines
- New: batch-select + update of pipeline stages
- Changed: batch-select + update of related builds, not scoped to
cancelable
builds only (this was likely an existing bug)
User pipelines
SELECT
"ci_pipelines"."id"
FROM
"ci_pipelines"
WHERE
"ci_pipelines"."user_id" = 4626651
AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
AND "ci_pipelines"."id" >= 2
ORDER BY
"ci_pipelines"."id" ASC
LIMIT 1 OFFSET 1000
https://explain.depesz.com/s/CgAA
Pipelines stages
SELECT "ci_stages"."id" FROM "ci_stages" WHERE "ci_stages"."pipeline_id" IN
(SELECT "ci_pipelines"."id" FROM "ci_pipelines"
WHERE "ci_pipelines"."user_id" = 4626651
AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
AND "ci_pipelines"."id" >= 1
)
AND "ci_stages"."status" IN (2,10,9,1,0,8) ORDER BY "ci_stages"."id" ASC LIMIT 1000
https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2778/commands/8664
Pipeline builds
SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."commit_id" IN
(SELECT "ci_pipelines"."id" FROM "ci_pipelines"
WHERE "ci_pipelines"."user_id" = 4626651
AND "ci_pipelines"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
AND "ci_pipelines"."id" >= 1
)
AND "ci_builds"."status" IN ('running', 'waiting_for_resource', 'preparing', 'pending', 'created', 'scheduled')
ORDER BY "ci_builds"."id" ASC LIMIT 1000
https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/2778/commands/8665
Migrations
I had originally included a migration that added an index on ci_pipelines
to make the outer batch query faster, but this work was already happening in parallel in https://gitlab.com/gitlab-org/gitlab/-/issues/301222 (MR: !56314 (merged)). We can ship both as separate improvements. I measured the queries with that index in place.
I also added a new partial index on ci_stages
to speed up the inner select on stages by state:
CREATE INDEX index_ci_stages_on_pipeline_id_and_id ON ci_stages USING btree (pipeline_id, id) WHERE (status = ANY (ARRAY[0, 1, 2, 8, 9, 10]));
UP
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: migrating ================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_stages, [:pipeline_id, :id], {:where=>"status IN (0, 1, 2, 8, 9, 10)", :name=>"index_ci_stages_on_pipeline_id_and_id", :algorithm=>:concurrently})
-> 0.0061s
-- add_index(:ci_stages, [:pipeline_id, :id], {:where=>"status IN (0, 1, 2, 8, 9, 10)", :name=>"index_ci_stages_on_pipeline_id_and_id", :algorithm=>:concurrently})
-> 0.0139s
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: migrated (0.0229s) =======
DOWN
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: reverting ================
-- transaction_open?()
-> 0.0000s
-- indexes(:ci_stages)
-> 0.0078s
-- current_schema()
-> 0.0003s
== 20210316152500 AddIndexCiStagesOnPipelineIdAndId: reverted (0.0134s) =======
Feature flag
I put this behind the feature flag abort_user_pipelines_on_block
scoped to particular user
s so we can test this first. I was not able to test this locally very well outside of unit tests.
Rollout issue: https://gitlab.com/gitlab-org/gitlab/-/issues/324045
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because the change is still behind a feature flag.
-
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Related to #323742