Draft: Enforce CI minutes quota for running jobs
What does this MR do?
Related to https://gitlab.com/gitlab-org/gitlab/-/issues/20856
This MR introduces a monitoring and enforcement of CI minutes usage for running builds.
Since we accumulate CI minutes consumption into namespace_statistics.shared_runners_seconds
when builds complete, we can have pipelines with very long running builds that can cause the CI minutes consumption to exceed by far the limit set on the root namespace.
To limit this we need to monitor the CI minutes consumption on running builds and enforce the limit by dropping builds when the limit is exceeded by a 20.minutes
grace period.
Because this operation is expensive (running at root namespace level) we have a few layers of checks to ensure we don't do any overprocessing:
- allow the check to be scheduled every 5 minutes per project
- allow the check to run exclusively every 5 minutes at namespace level (out of the multiple schedules from
1.
above, only 1 actually runs) - skip processing if not on
Gitlab.com
- skip processing if project does not have shared runners enabled or is public
- skip processing if project is on any paid plans (no trial)
- only consider cancelable builds in recent cancelable pipelines
TODO:
- add remaining specs
- add feature flag
- do E2E manual QA
Query plans
Builds in namespace being run by shared runners
Executes a query per batch of builds.
::Ci::Build
.running
.from_shared_runners
.for_project(root_namespace.all_projects)
.updated_after(RUNNING_BUILDS_SINCE_TIME.ago)
.each_batch { ... }
https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3192/commands/10427
Time: 24.021 ms
- planning: 23.447 ms
- execution: 0.574 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 4 (~32.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
All projects in namespace
root_namespace.all_projects.find_each { ... }
https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3246/commands/10618
Time: 20.115 ms
- planning: 3.249 ms
- execution: 16.866 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 4097 (~32.00 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Online specific runners that can be used by a project
executed for each project in the namespace having builds to drop.
Ci::Runner.specific_for_project(project).with_tags.online.to_a
Time: 34.966 ms
- planning: 5.353 ms
- execution: 29.613 ms
- I/O read: 28.266 ms
- I/O write: N/A
Shared buffers:
- hits: 61 (~488.00 KiB) from the buffer pool
- reads: 9 (~72.00 KiB) from the OS file cache, including disk I/O
- dirtied: 1 (~8.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10619
Recent cancelable pipelines for project
::Ci::Pipeline
.for_project(project)
.cancelable
.updated_after(ALIVE_BUILDS_SINCE_TIME.ago)
.each_batch(of: 100) { ... }
https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10621
Time: 0.961 ms
- planning: 0.601 ms
- execution: 0.360 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 62 (~496.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Recent alive builds in pipelines
::Ci::Build.in_pipelines(pipelines)
.running_or_pending_or_created
.updated_after(ALIVE_BUILDS_SINCE_TIME.ago)
.in_batches(of: 150) { ... }
For the plan I used in_batches(of: 3)
to test the first batch.
https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10622
Time: 85.662 ms
- planning: 19.410 ms
- execution: 66.252 ms
- I/O read: 65.670 ms
- I/O write: N/A
Shared buffers:
- hits: 22 (~176.00 KiB) from the buffer pool
- reads: 10 (~80.00 KiB) from the OS file cache, including disk I/O
- dirtied: 1 (~8.00 KiB)
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team