Add dynamic concurrency limit for create pipeline worker (!158636) · Merge requests · GitLab.org / GitLab

Marcos Rocha requested to merge mc_rocha-pause-create-pipeline-using-concurrency-limit into master Jul 08, 2024

What does this MR do and why?

This issue is related to the epic &13997.

Our goal is to provide an way for instance admins to manage the number of jobs executed on behalf of a scheduled scan execution policy, so that pipelines are distributed and do not overburden the runners.

This MR adds a custom concurrency limit for CreatePipelineWorker to improve our previous solution that used the concurrency_limit alone.

Our concerns about using the concurrency_limit for our case are:

It limits the worker's concurrency, but we want to restrict the CI builds' concurrency. One worker will create one pipeline, but the pipeline can have multiple ci build jobs.
Sidekiq jobs seem to be much faster than pipeline jobs executed by the runners, so limiting the number of workers with the concurrency_limit might not be enough to reduce the runner's pressure.

This solution has some limitations, but it is an improvement compared to using the worker concurrency_limit attribute alone.

Database query

SELECT
    "p_ci_builds"."status",
    "p_ci_builds"."finished_at",
    "p_ci_builds"."created_at",
    "p_ci_builds"."updated_at",
    "p_ci_builds"."started_at",
    "p_ci_builds"."coverage",
    "p_ci_builds"."name",
    "p_ci_builds"."options",
    "p_ci_builds"."allow_failure",
    "p_ci_builds"."stage",
    "p_ci_builds"."stage_idx",
    "p_ci_builds"."tag",
    "p_ci_builds"."ref",
    "p_ci_builds"."type",
    "p_ci_builds"."target_url",
    "p_ci_builds"."description",
    "p_ci_builds"."erased_at",
    "p_ci_builds"."artifacts_expire_at",
    "p_ci_builds"."environment",
    "p_ci_builds"."when",
    "p_ci_builds"."yaml_variables",
    "p_ci_builds"."queued_at",
    "p_ci_builds"."lock_version",
    "p_ci_builds"."coverage_regex",
    "p_ci_builds"."retried",
    "p_ci_builds"."protected",
    "p_ci_builds"."failure_reason",
    "p_ci_builds"."scheduled_at",
    "p_ci_builds"."token_encrypted",
    "p_ci_builds"."resource_group_id",
    "p_ci_builds"."waiting_for_resource_at",
    "p_ci_builds"."processed",
    "p_ci_builds"."scheduling_type",
    "p_ci_builds"."id",
    "p_ci_builds"."stage_id",
    "p_ci_builds"."partition_id",
    "p_ci_builds"."auto_canceled_by_partition_id",
    "p_ci_builds"."auto_canceled_by_id",
    "p_ci_builds"."commit_id",
    "p_ci_builds"."erased_by_id",
    "p_ci_builds"."project_id",
    "p_ci_builds"."runner_id",
    "p_ci_builds"."trigger_request_id",
    "p_ci_builds"."upstream_pipeline_id",
    "p_ci_builds"."user_id",
    "p_ci_builds"."execution_config_id"
FROM
    "p_ci_builds"
    INNER JOIN "ci_pipelines" "pipeline" ON "pipeline"."partition_id" IS NOT NULL
        AND "pipeline"."id" = "p_ci_builds"."commit_id"
        AND "pipeline"."partition_id" = "p_ci_builds"."partition_id"
WHERE
    "p_ci_builds"."type" = 'Ci::Build'
    AND "pipeline"."source" = 15
    AND ("p_ci_builds"."status" IN ('preparing', 'pending', 'running', 'waiting_for_callback', 'waiting_for_resource', 'canceling', 'created'))
    AND "p_ci_builds"."created_at" > '2024-07-15 15:07:29.351683'
    AND "p_ci_builds"."updated_at" > '2024-07-15 15:07:29.351826'
LIMIT 100

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/29883/commands/92878

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Create a new group
Create some projects using the script

user = User.first
namespace_id = Group.last.id

5.times do
  project_params = {
    namespace_id: namespace_id,
    name: "Test-#{FFaker::Lorem.characters(15)}"
  }

  project = ::Projects::CreateService.new(user, project_params).execute
  project.save!

  project.repository.create_file(user, 'Gemfile.lock', '', branch_name: Gitlab::DefaultBranch.value,
    message: 'Add Gemfile.lock file')
  project.repository.create_file(user, 'test.rb', 'puts "hello world"', branch_name: Gitlab::DefaultBranch.value,
    message: 'Add test.rb file')

  5.times do
    branch_name = "branch-#{FFaker::Lorem.characters(15)}"
    ::Branches::CreateService.new(project, user).execute(branch_name, project.default_branch)
  end
end

Go to the Group page
Go to Secure > Policies
Click in new policy
Select Scan Execution Policy
Change to the .yaml mode
Copy the policy content below

type: scan_execution_policy
name: policy
description: ''
enabled: true
policy_scope:
  projects:
    excluding: []
rules:
  - type: schedule
    cadence: 0 0 * * *
    timezone: Etc/UTC
    branch_type: all
actions:
  - scan: secret_detection
  - scan: sast
  - scan: sast_iac
  - scan: container_scanning
  - scan: dependency_scanning

Merge the policy
Enable the feature flags

Feature.enable(:scan_execution_pipeline_worker)
Feature.enable(:scan_execution_pipeline_concurrency_control)

Go to the Admin Area
Go to settings > CI/CD > Continuous Integration and Deployment
Update the Security policy scheduled scans maximum concurrency value to 50
Trigger the scheduled scans

Get the schedule id in rails console

rule_schedule_id = Security::OrchestrationPolicyRuleSchedule.last.id

Update the schedule next run_at to a time in the past using the gdk psql

UPDATE security_orchestration_policy_rule_schedules SET next_run_at = '2024-05-28 00:15:00+00' WHERE id = <rule_schedule_id>;

trigger the schedule in the rails console

Security::OrchestrationPolicyRuleScheduleNamespaceWorker.new.perform(rule_schedule_id)

Verify in the rails console that number of active ::Ci::Build jobs:

You can use the pause strategy query to check the number of active ::Ci::Build jobs:

while true
  puts  ::Ci::Build.with_pipeline_source_type('security_orchestration_policy')
    .with_status(*::Ci::HasStatus::ALIVE_STATUSES)
    .created_after(1.hour.ago)
    .updated_after(1.hour.ago).count

  sleep 3
end

It might take some time, but you should be able to see the ci jobs count stop to increase after the limit is reached.

Edited Aug 07, 2024 by Marcos Rocha

Add dynamic concurrency limit for create pipeline worker

What does this MR do and why?

Database query

MR acceptance checklist

How to set up and validate locally

Merge request reports