Add dynamic concurrency limit for create pipeline worker
What does this MR do and why?
This issue is related to the epic &13997.
Our goal is to provide an way for instance admins to manage the number of jobs executed on behalf of a scheduled scan execution policy, so that pipelines are distributed and do not overburden the runners.
This MR adds a custom concurrency limit for CreatePipelineWorker
to improve our previous solution that used the concurrency_limit
alone.
Our concerns about using the concurrency_limit
for our case are:
-
It limits the worker's concurrency, but we want to restrict the CI builds' concurrency. One worker will create one pipeline, but the pipeline can have multiple
ci build
jobs. -
Sidekiq jobs seem to be much faster than pipeline jobs executed by the runners, so limiting the number of workers with the
concurrency_limit
might not be enough to reduce the runner's pressure.
This solution has some limitations, but it is an improvement compared to using the worker concurrency_limit
attribute alone.
Database query
SELECT
"p_ci_builds"."status",
"p_ci_builds"."finished_at",
"p_ci_builds"."created_at",
"p_ci_builds"."updated_at",
"p_ci_builds"."started_at",
"p_ci_builds"."coverage",
"p_ci_builds"."name",
"p_ci_builds"."options",
"p_ci_builds"."allow_failure",
"p_ci_builds"."stage",
"p_ci_builds"."stage_idx",
"p_ci_builds"."tag",
"p_ci_builds"."ref",
"p_ci_builds"."type",
"p_ci_builds"."target_url",
"p_ci_builds"."description",
"p_ci_builds"."erased_at",
"p_ci_builds"."artifacts_expire_at",
"p_ci_builds"."environment",
"p_ci_builds"."when",
"p_ci_builds"."yaml_variables",
"p_ci_builds"."queued_at",
"p_ci_builds"."lock_version",
"p_ci_builds"."coverage_regex",
"p_ci_builds"."retried",
"p_ci_builds"."protected",
"p_ci_builds"."failure_reason",
"p_ci_builds"."scheduled_at",
"p_ci_builds"."token_encrypted",
"p_ci_builds"."resource_group_id",
"p_ci_builds"."waiting_for_resource_at",
"p_ci_builds"."processed",
"p_ci_builds"."scheduling_type",
"p_ci_builds"."id",
"p_ci_builds"."stage_id",
"p_ci_builds"."partition_id",
"p_ci_builds"."auto_canceled_by_partition_id",
"p_ci_builds"."auto_canceled_by_id",
"p_ci_builds"."commit_id",
"p_ci_builds"."erased_by_id",
"p_ci_builds"."project_id",
"p_ci_builds"."runner_id",
"p_ci_builds"."trigger_request_id",
"p_ci_builds"."upstream_pipeline_id",
"p_ci_builds"."user_id",
"p_ci_builds"."execution_config_id"
FROM
"p_ci_builds"
INNER JOIN "ci_pipelines" "pipeline" ON "pipeline"."partition_id" IS NOT NULL
AND "pipeline"."id" = "p_ci_builds"."commit_id"
AND "pipeline"."partition_id" = "p_ci_builds"."partition_id"
WHERE
"p_ci_builds"."type" = 'Ci::Build'
AND "pipeline"."source" = 15
AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource', 'canceling', 'created'))
AND "p_ci_builds"."created_at" > '2024-07-15 15:07:29.351683'
AND "p_ci_builds"."updated_at" > '2024-07-15 15:07:29.351826'
LIMIT 100
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/29883/commands/92878
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Create a new group
- Create some projects using the script
user = User.first
namespace_id = Group.last.id
5.times do
project_params = {
namespace_id: namespace_id,
name: "Test-#{FFaker::Lorem.characters(15)}"
}
project = ::Projects::CreateService.new(user, project_params).execute
project.save!
project.repository.create_file(user, 'Gemfile.lock', '', branch_name: Gitlab::DefaultBranch.value,
message: 'Add Gemfile.lock file')
project.repository.create_file(user, 'test.rb', 'puts "hello world"', branch_name: Gitlab::DefaultBranch.value,
message: 'Add test.rb file')
5.times do
branch_name = "branch-#{FFaker::Lorem.characters(15)}"
::Branches::CreateService.new(project, user).execute(branch_name, project.default_branch)
end
end
- Go to the Group page
- Go to Secure > Policies
- Click in new policy
- Select Scan Execution Policy
- Change to the .yaml mode
- Copy the policy content below
type: scan_execution_policy
name: policy
description: ''
enabled: true
policy_scope:
projects:
excluding: []
rules:
- type: schedule
cadence: 0 0 * * *
timezone: Etc/UTC
branch_type: all
actions:
- scan: secret_detection
- scan: sast
- scan: sast_iac
- scan: container_scanning
- scan: dependency_scanning
-
Merge the policy
-
Enable the feature flags
Feature.enable(:scan_execution_pipeline_worker)
Feature.enable(:scan_execution_pipeline_concurrency_control)
- Go to the Admin Area
- Go to settings > CI/CD > Continuous Integration and Deployment
- Update the
Security policy scheduled scans maximum concurrency
value to 50 - Trigger the scheduled scans
Get the schedule id in rails console
rule_schedule_id = Security::OrchestrationPolicyRuleSchedule.last.id
Update the schedule next run_at
to a time in the past using the gdk psql
UPDATE security_orchestration_policy_rule_schedules SET next_run_at = '2024-05-28 00:15:00+00' WHERE id = <rule_schedule_id>;
trigger the schedule in the rails console
Security::OrchestrationPolicyRuleScheduleNamespaceWorker.new.perform(rule_schedule_id)
- Verify in the rails console that number of active
::Ci::Build
jobs:
You can use the pause strategy query to check the number of active ::Ci::Build
jobs:
while true
puts ::Ci::Build.with_pipeline_source_type('security_orchestration_policy')
.with_status(*::Ci::HasStatus::ALIVE_STATUSES)
.created_after(1.hour.ago)
.updated_after(1.hour.ago).count
sleep 3
end
It might take some time, but you should be able to see the ci jobs count stop to increase after the limit is reached.