Skip to content

Fix ScheduledEnq to use job class's queue for push

Sylvester Chin requested to merge sc1-fix-sidekiq-routing-transition into master

What does this MR do and why?

Fix ScheduledEnq to use job class's queue for push

This addresses misrouted jobs when changing a worker attribute. During the deployment period where both new and old versions are running, jobs scheduled by the new processes could be picked up by an older process and routed with a wrong combination of store and queue.

See gitlab-com/gl-infra/scalability#3497 (comment 2007088595)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Set up

  1. GDK but with sidekiq disabled (gdk stop rails-background-jobs)
  2. Another redis: docker run -p 6378:6379 -d redis:6.2-alpine and configure config/redis.yml with:
➜  cat config/redis.yml
---
development:
  queues_shard_catchall_a:
    url: "redis://localhost:6378"
  1. Prepare config/gitlab.yml with the following sidekiq routing rules
production: &base
  sidekiq:
    log_format: json # (default is also supported)
    routing_rules:
      - ["tags=needs_own_queue", null]
      - ["*", "default"]

For each branch (master and this):

  1. Start up a Sidekiq process with the OLD config
SIDEKIQ_SHARD_NAME=queues_shard_catchall_a bundle exec ruby bin/sidekiq-cluster default,chaos -e development --timeout 10
  1. Update config/gitlab.yml to NEW with
production: &base
  sidekiq:
    log_format: json # (default is also supported)
    routing_rules:
      - ["tags=needs_own_queue", null]
      - ["worker_name=Chaos::CpuSpinWorker", "chaos", "queues_shard_catchall_a"]
      - ["*", "default"]
  1. Open a gdk console and schedule a CpuSpinWorker
> gdk rails c
[1] pry(main)> Chaos::CpuSpinWorker.perform_in(1, 1)
=> "06c944d35e8382cffffb47da"
  1. Check redis-cli. Note: you have to run gdk stop rails-background-jobs, or the jobs will be consumed.

On the master branch, the queue:chaos will have 1 job enqueued since the "old" process pushes to the new queue and old store.

➜  gitlab git:(master) ✗ gdk redis-cli -n 1 lrange queue:chaos 0 -1
1) "{\"retry\":3,\"queue\":\"chaos\",\"backtrace\":true,\"backtrace\":true,\"version\":0,\"store\":\"queues_shard_catchall_a\",\"queue_namespace\":\"chaos\",\"class\":\"Chaos::CpuSpinWorker\",\"args\":[1],\"jid\":\"384c47e1ea3274d1427df170\",\"created_at\":1721798278.708093,\"trace_propagation_headers\":{\"sentry-trace\":\"ff8e1d78c8504f2eaa6ff627fef033ac-c816f994e58e424b\",\"baggage\":\"sentry-trace_id=ff8e1d78c8504f2eaa6ff627fef033ac,sentry-environment=,sentry-release=18266f1dccb\"},\"meta.sidekiq_destination_shard_redis\":\"queues_shard_catchall_a\",\"correlation_id\":\"727ef2bc380d45cd5090d26550ad6c1d\",\"worker_data_consistency\":\"always\",\"size_limiter\":\"validated\",\"scheduled_at\":1721798279.708069,\"idempotency_key\":\"resque:gitlab:duplicate:chaos:22a5e8e9d30eac0fa7a7ff37a2624dcb6b65842c19798591fa0cc57d26805043\",\"enqueued_at\":1721798281.151815}"

On the feature branch, the queue:default will have 1 job enqueued since the "old" process pushes to the old queue and old store.

➜  gitlab git:(sc1-fix-sidekiq-routing-transition) ✗ gdk redis-cli -n 1 lrange queue:default 0 -1
1) "
{\"retry\":3,\"queue\":\"default\",\"backtrace\":true,\"backtrace\":true,\"version\":0,\"store\":\"queues_shard_catchall_a\",\"queue_namespace\":\"chaos\",\"class\":\"Chaos::CpuSpinWorker\",\"args\":[1],\"jid\":\"06c944d35e8382cffffb47da\",\"created_at\":1721798508.2754,\"trace_propagation_headers\":{\"sentry-trace\":\"df43a6f0ba6a4536b9a2b03e259ef662-ea4c0a62e2734b50\",\"baggage\":\"sentry-trace_id=df43a6f0ba6a4536b9a2b03e259ef662,sentry-environment=,sentry-release=cec6f423b76\"},\"meta.sidekiq_destination_shard_redis\":\"queues_shard_catchall_a\",\"correlation_id\":\"0522bc67336e917ff8917652d5e3064d\",\"worker_data_consistency\":\"always\",\"size_limiter\":\"validated\",\"scheduled_at\":1721798509.275357,\"idempotency_key\":\"resque:gitlab:duplicate:default:22a5e8e9d30eac0fa7a7ff37a2624dcb6b65842c19798591fa0cc57d26805043\",\"enqueued_at\":1721798511.479434}"
Edited by Sylvester Chin

Merge request reports

Loading