Skip to content

Expand sidekiq queue_groups to default

Gregorius Marco requested to merge mg-override-default-sidekiq-queues into master

What does this MR do and why?

Part of gitlab-com/gl-infra/scalability#2720 (closed)

This MR consists of 2 things, they can be separated into 2 MRs, but we decided to combine them together so we don't need to coordinate the merge and make sure both gets included on the same release.

1. Expand sidekiq queue_groups to default

When routing rules is not specified (which is the default), all jobs go to default and mailers queue (Ref: 1, 2).

OTOH, since the default sidekiq['queue_groups'] setting is ['*'] and Reference Architecture recommended sidekiq['queue_groups'] = ['*'] * 4, this means sidekiq is listening to all 600+ queues (* character means expand to all worker-named queues). It is wasteful for Sidekiq to listen to all these queues as Redis needs to BRPOP from all these queues (more reading https://about.gitlab.com/blog/2021/09/02/specialized-sidekiq-configuration-lessons-from-gitlab-dot-com/).

Thus, if routing rules are not specified, we can safely override whatever is defined in sidekiq queue_groups to default,mailers queues only so that Redis don't incur overhead of BRPOPing 600+ queues.

The changes here are primarily targeting the SM instances as .com specifies the routing rules in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/gprd.yaml.gotmpl#L957-967.

2. PDM to migrate queued and future Sidekiq jobs

Originally from !142676 (closed), with a backend pre-approval.

Migrate jobs that don't belong to any queue from the routing rules to their correct queue.

Why is the migration needed?

  • Before %16.0, jobs are pushed to their worker-named queue by default, equivalent to [['*', null]] in routing rules terms.
  • On %16.0, routing rules are defaulted to [['*', 'default']] which means all jobs from all worker class are pushed to the default queue. However, Sidekiq server still listens to all queues by default (ie sidekiq['queue_groups'] = ['*']).
  • In !142577 (merged) %16.9, we're changing queue_groups to be default,mailers if routing rules aren't set.
  • Scheduled and retried jobs might still be left in their worker queues. For example, scheduled/retried jobs while in pre-16.0 could be scheduled in the distant future, and these will remain in their individual queues. This migration ensures no jobs will be lost while upgrading to %16.9

Note:

  1. The code is mostly copy-pasted from https://gitlab.com/gitlab-org/gitlab/-/blob/7066e106511295f69e380eef89c7e9a805a29330/lib/gitlab/sidekiq_migrate_jobs.rb which is a Rake task designed to migrate the jobs whenever routing rules are updated.
  2. We have tried to introduce the PDM before, but was reverted/cancelled because different approach was taken back then: a. Queued jobs: !100392 (merged) b. Future jobs: !103001 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

In gitlab.yml, comment out the sidekiq.routing_rules:

#    routing_rules:
#      - ["tags=needs_own_queue", null]
#      - ["*", "default"]

On master branch:

  1. Run below command:
❯ bin/sidekiq-cluster --dryrun 'foo'
bundle exec sidekiq -c4 -edevelopment -t25 -gqueues:foo,default,mailers -r/Users/gregoriusmarco/Documents/workspace/gdk-10-22/gitlab -qfoo,1 -qdefault,1 -qmailers,1
  1. The queues consist of foo, default and mailers

On this branch:

  1. foo doesn't appear in the list of queues:
❯ bin/sidekiq-cluster --dryrun 'foo'
bundle exec sidekiq -c20 -edevelopment -t25 -gqueues:default,mailers -r/Users/gregoriusmarco/Documents/workspace/gdk-10-22/gitlab -qdefault,1 -qmailers,1
  1. Concurrency is set to 20 (-c20) which is the default if we were to run on master branch with bin/sidekiq-cluster --dryrun '*'. Note that * is the default queue_groups in Omnibus https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/7a3ab2a322a15a4ffc62a6a7fa4e458155302156/files/gitlab-cookbooks/gitlab/attributes/default.rb#L719, so most installations would run with this.

Numbered steps to set up and validate the change are strongly suggested.

Edited by Gregorius Marco

Merge request reports

Loading