Skip to content

Add dynamic concurrency limit for create pipeline worker

What does this MR do and why?

This issue is related to the epic &13997.

Our goal is to provide an way for instance admins to manage the number of jobs executed on behalf of a scheduled scan execution policy, so that pipelines are distributed and do not overburden the runners.

This MR adds a custom concurrency limit for CreatePipelineWorker to improve our previous solution that used the concurrency_limit alone.

Our concerns about using the concurrency_limit for our case are:

  • It limits the worker's concurrency, but we want to restrict the CI builds' concurrency. One worker will create one pipeline, but the pipeline can have multiple ci build jobs.

  • Sidekiq jobs seem to be much faster than pipeline jobs executed by the runners, so limiting the number of workers with the concurrency_limit might not be enough to reduce the runner's pressure.

This solution has some limitations, but it is an improvement compared to using the worker concurrency_limit attribute alone.

Database query

SELECT
    "p_ci_builds"."status",
    "p_ci_builds"."finished_at",
    "p_ci_builds"."created_at",
    "p_ci_builds"."updated_at",
    "p_ci_builds"."started_at",
    "p_ci_builds"."coverage",
    "p_ci_builds"."name",
    "p_ci_builds"."options",
    "p_ci_builds"."allow_failure",
    "p_ci_builds"."stage",
    "p_ci_builds"."stage_idx",
    "p_ci_builds"."tag",
    "p_ci_builds"."ref",
    "p_ci_builds"."type",
    "p_ci_builds"."target_url",
    "p_ci_builds"."description",
    "p_ci_builds"."erased_at",
    "p_ci_builds"."artifacts_expire_at",
    "p_ci_builds"."environment",
    "p_ci_builds"."when",
    "p_ci_builds"."yaml_variables",
    "p_ci_builds"."queued_at",
    "p_ci_builds"."lock_version",
    "p_ci_builds"."coverage_regex",
    "p_ci_builds"."retried",
    "p_ci_builds"."protected",
    "p_ci_builds"."failure_reason",
    "p_ci_builds"."scheduled_at",
    "p_ci_builds"."token_encrypted",
    "p_ci_builds"."resource_group_id",
    "p_ci_builds"."waiting_for_resource_at",
    "p_ci_builds"."processed",
    "p_ci_builds"."scheduling_type",
    "p_ci_builds"."id",
    "p_ci_builds"."stage_id",
    "p_ci_builds"."partition_id",
    "p_ci_builds"."auto_canceled_by_partition_id",
    "p_ci_builds"."auto_canceled_by_id",
    "p_ci_builds"."commit_id",
    "p_ci_builds"."erased_by_id",
    "p_ci_builds"."project_id",
    "p_ci_builds"."runner_id",
    "p_ci_builds"."trigger_request_id",
    "p_ci_builds"."upstream_pipeline_id",
    "p_ci_builds"."user_id",
    "p_ci_builds"."execution_config_id"
FROM
    "p_ci_builds"
    INNER JOIN "ci_pipelines" "pipeline" ON "pipeline"."partition_id" IS NOT NULL
        AND "pipeline"."id" = "p_ci_builds"."commit_id"
        AND "pipeline"."partition_id" = "p_ci_builds"."partition_id"
WHERE
    "p_ci_builds"."type" = 'Ci::Build'
    AND "pipeline"."source" = 15
    AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource', 'canceling', 'created'))
    AND "p_ci_builds"."created_at" > '2024-07-15 15:07:29.351683'
    AND "p_ci_builds"."updated_at" > '2024-07-15 15:07:29.351826'
LIMIT 100

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/29883/commands/92878

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Create a new group
  2. Create some projects using the script
user = User.first
namespace_id = Group.last.id

5.times do
  project_params = {
    namespace_id: namespace_id,
    name: "Test-#{FFaker::Lorem.characters(15)}"
  }

  project = ::Projects::CreateService.new(user, project_params).execute
  project.save!

  project.repository.create_file(user, 'Gemfile.lock', '', branch_name: Gitlab::DefaultBranch.value,
    message: 'Add Gemfile.lock file')
  project.repository.create_file(user, 'test.rb', 'puts "hello world"', branch_name: Gitlab::DefaultBranch.value,
    message: 'Add test.rb file')

  5.times do
    branch_name = "branch-#{FFaker::Lorem.characters(15)}"
    ::Branches::CreateService.new(project, user).execute(branch_name, project.default_branch)
  end
end
  1. Go to the Group page
  2. Go to Secure > Policies
  3. Click in new policy
  4. Select Scan Execution Policy
  5. Change to the .yaml mode
  6. Copy the policy content below
type: scan_execution_policy
name: policy
description: ''
enabled: true
policy_scope:
  projects:
    excluding: []
rules:
  - type: schedule
    cadence: 0 0 * * *
    timezone: Etc/UTC
    branch_type: all
actions:
  - scan: secret_detection
  - scan: sast
  - scan: sast_iac
  - scan: container_scanning
  - scan: dependency_scanning
  1. Merge the policy

  2. Enable the feature flags

Feature.enable(:scan_execution_pipeline_worker)
Feature.enable(:scan_execution_pipeline_concurrency_control)
  1. Go to the Admin Area
  2. Go to settings > CI/CD > Continuous Integration and Deployment
  3. Update the Security policy scheduled scans maximum concurrency value to 50
  4. Trigger the scheduled scans

Get the schedule id in rails console

rule_schedule_id = Security::OrchestrationPolicyRuleSchedule.last.id

Update the schedule next run_at to a time in the past using the gdk psql

UPDATE security_orchestration_policy_rule_schedules SET next_run_at = '2024-05-28 00:15:00+00' WHERE id = <rule_schedule_id>;

trigger the schedule in the rails console

Security::OrchestrationPolicyRuleScheduleNamespaceWorker.new.perform(rule_schedule_id)
  1. Verify in the rails console that number of active ::Ci::Build jobs:

You can use the pause strategy query to check the number of active ::Ci::Build jobs:

while true
  puts  ::Ci::Build.with_pipeline_source_type('security_orchestration_policy')
    .with_status(*::Ci::HasStatus::ALIVE_STATUSES)
    .created_after(1.hour.ago)
    .updated_after(1.hour.ago).count

  sleep 3
end

It might take some time, but you should be able to see the ci jobs count stop to increase after the limit is reached.

Edited by Marcos Rocha

Merge request reports

Loading