Redistribute pipelines RSpec jobs parallelization
From draft to ready
-
Remove test commit: !133976 (5b7e0036) -
Ensure that we also changed the comments on the artifacts collector
Context
Closes #422702 (closed).
What does this MR do?
Redistribute the RSpec parallel jobs to aim for a maximum of 40min on average, while spending as little money for it as possible.
Are we spending a lot more CI/CD money for this?
TL;DR: Only a little bit.
In most cases in this MR, we are redistributing jobs from a given "RSpec job class" (e.g. rspec unit pg14
is a RSpec job class with 28 parallel jobs) to another. With this approach, we don't spend extra money (see the section below if you're interested to know why).
The RSpec jobs classes we're redistributing are:
- 4 jobs from
rspec unit pg14
torspec-ee unit pg14
- 2 jobs from
rspec system pg14
torspec-ee system pg14
- 2 jobs from
rspec-ee integration pg14
torspec integration pg14
Additionally, the migration jobs were taking longer than most other job classes, so I added some extra parallel jobs for the following classes that I could not take from other jobs:
- 4 jobs to
rspec migration pg14
- 2 jobs to
rspec migration pg14-as-if-foss
Those we have to pay out of pocket, BUT they are not executed as often as the ones above, so the cost is relatively lower than if we added say 2 parallel jobs to rspec unit pg14
.
To see this in the last three months, we can see the number of jobs ran for each RSpec job classes:
-
rspec-ee unit pg14
: 387'548 jobs -
rspec integration pg14
: 247'049 jobs -
rspec-ee system pg14
: 225'280 jobs -
rspec migration pg14
: 73'619 jobs -
rspec migration pg14-as-if-foss
: 21'593 jobs
As a general comment about CI costs: we have some high-leverage issues such as #412717 (closed) where we could drastically reduce our CI costs (mainly because we would run less often the FOSS tests than we currently do in gitlab-org/gitlab
MR pipelines). Tweaking the file patterns for which jobs should be triggered in pipelines could give us big savings as well.
Why those RSpec jobs classes specifically?
A few factors:
- The average duration for those RSpec jobs classes
- The number of times they were run over the last three months
- The pairs of RSpec duration classes (fast ones giving away jobs to slow ones) should be executed in the same pipelines most of the time (otherwise, we could make a certain job class slower, and it could become the critical path of those pipelines)
All the data from above is shown in the data
section below.
Why are we not spending more money when redistributing a job from one "RSpec job class" to another?
Disclaimer: What's below is my current understanding of our CI costs, which might be wrong
Let's take rspec unit pg14
tests as an example. The number of tests we have to run will be the same, whether we run them in one job or 30 jobs. What's making the cost go higher when adding more jobs is the setup/teardown around the RSpec run. When adding a parallel job, we're therefore paying for the extra setup/teardown for that new job.
If we are removing a parallel job, the exact opposite is true: we'll save the setup/teardown money, but the tests that this job executed will still have to be run on other jobs, making them longer, or in other words, more expensive.
If we are redistributing jobs, it cancels out: the setup/teardown for the RSpec job we remove will be used for the new job.
The data
Based on a sample pipeline and more global job stats.
Show me more data!
rspec unit pg14
and rspec-ee unit pg14
rspec unit pg14
jobs (28 jobs) are way faster (8min faster) than rspec-ee unit pg14
jobs (18 jobs). They could be rebalanced.
Rules
Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:
# rspec unit
.rails:rules:ee-and-foss-unit:
rules:
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:ee-and-foss-default-rules", rules]
- <<: *if-default-refs
changes: *backend-patterns
- <<: *if-default-refs # This is different
changes: *backstage-patterns # This is different
# rspec-ee unit
.rails:rules:ee-only-unit:
rules:
- <<: *if-not-ee # This is different
when: never # This is different
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:ee-and-foss-default-rules", rules]
- <<: *if-default-refs
changes: *backend-patterns
rspec system pg14
and rspec-ee system pg14
rspec system pg14
jobs (28 jobs) are faster (3.80min faster) than rspec-ee system pg14
jobs (10 jobs). They could be rebalanced.
Rules
Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:
# rspec system
.rails:rules:ee-and-foss-system:
rules:
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:system-default-rules", rules]
- <<: *if-default-refs
changes: *code-backstage-patterns
# rspec-ee system
.rails:rules:ee-only-system:
rules:
- <<: *if-not-ee # This is different
when: never # This is different
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:system-default-rules", rules]
- <<: *if-default-refs
changes: *code-backstage-patterns
rspec-ee integration pg14
and rspec integration pg14
rspec-ee integration pg14
jobs (6 jobs) are faster (3.38min faster) than rspec integration pg14
jobs (12 jobs). They could be rebalanced.
Rules
Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:
# rspec-ee integration pg14
.rails:rules:ee-only-integration:
rules:
- <<: *if-not-ee # This is different
when: never # This is different
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:ee-and-foss-default-rules", rules]
- <<: *if-default-refs
changes: *backend-patterns
# rspec integration pg14
.rails:rules:ee-and-foss-integration:
rules:
- <<: *if-fork-merge-request
when: never
- !reference [".rails:rules:ee-and-foss-default-rules", rules]
- <<: *if-default-refs
changes: *backend-patterns
Screenshots or screen recordings
I made another MR to run pipelines without the changes:
- Control MR: !134105 (closed)
- Control pipeline: https://gitlab.com/gitlab-org/gitlab/-/pipelines/1035894857 (67min50s)
- Control Pipeline Trace: https://observe.gitlab.com/v1/jaeger/16947798/trace/e346b52d2c958b4f8eedbf42787ecce0
migration pg14-as-if-foss
migration pg14
rspec-ee
- New pipeline: https://gitlab.com/gitlab-org/gitlab/-/pipelines/1035931648 (54min50s)
- New Pipeline Trace: https://observe.gitlab.com/v1/jaeger/16947798/trace/c00678dbe17a5f5b15250cc0837b258d
migration pg14-as-if-foss
migration pg14
rspec-ee
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.