Skip to content

GitLab Next

Why GitLab
Pricing
Contact Sales
Explore

Sign in
Get free trial

Draft: Enforce concurrency limit for Shared Runners queue

Review changes
Download
Patches
Plain diff

Fabio Pitino requested to merge fp-enforce-concurrency-limit-shared-runners-queue into master Apr 28, 2021

Overview 12
Commits 1
Pipelines 5
Changes 6

What does this MR do?

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/329123

This change enforces a concurrency limit as the number of jobs that can be running on Shared Runners. The limit is enforced at root namespace level.

When a Runner requests a new job to pick, before we assign the job to the Runner we check if the concurrency limit is exceeded, if so we ignore the build and move on to the next job.

The concurrency value is tracked in Redis and is:

incremented any time a job is being running by a shared runner
decremented any time a job completes after being executed by a shared runner

Trade-offs considered

Ideally we should not have jobs in pending state if they can't be picked by the runner. The complex aspect of the queue implementation is that a job could be picked by either shared or specific runners is both are configured for the given project. Blocking a job just because it could potentially be picked by shared runners is not possible, at least until we have a separate queue for shared runners.

Pros: simpler approach. Leave job in pending until the number of running jobs goes below the concurrency limit
Cons: Jobs still remain in the queue and could potentially be picked again by RegisterJobService. Although with an increasing number of jobs in the queue, the fair scheduling should drop the priority of those builds overtime.

idea 2

Another idea I have considered was to change Ci::ProcessBuildService to enqueue the build only if the concurrency limit allowed that. Otherwise the job would have remained in created state.

Pros: Jobs don't become visible to the runners
Cons: We need to match builds with any specific runners to see if they could potentially be picked by specific runners and ignore the concurrency limit. This was getting complex when considering that, in case of both specific and shared runners, a job could be moved to the queue and still be picked by shared runners.

idea 3

Similarly to the concept of resource_group we could either leverage the existing status waiting_for_resource or introduce a new status which gates when the build can go to the pending state.

Pros: This seems to be probably the most semantically correct approach
Cons: It's more complex and would require quite some refactoring to implement. Given that we are limiting concurrency of jobs as the MVC, where we could change strategy in the future, it carries a considerable risk if we switch strategy.

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

📋 Does this MR need a changelog?
- I have included a changelog entry.
- I have not included a changelog entry because _____.
Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
Database guides
Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Apr 28, 2021 by Fabio Pitino

Merge request reports

Assignee Loading

Reviewers Loading

Request review from

Loading

Time tracking Loading

Loading