Update job coordinator for compatibility with sidekiq sharding
What does this MR do and why?
This MR wraps the pending_jobs
and steal
methods in job coordinator to be shard-aware.
Shard-awareness is a gitlab.com only concern as it is not a feature that we are rolling out to everyone. It is a horizontal scaling approach for Sidekiq which applies to gitlab.com because of the scale we are at.
See gitlab-com/gl-infra/scalability#2817 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Set-up
- Run docker to create an extra redis instance
docker run -p 6378:6379 -d redis:6.0-alpine
- Update
gitlab.yml
## Sidekiq
sidekiq:
log_format: json # (default is also supported)
routing_rules:
- ["tags=needs_own_queue", null]
- ["worker_name=BackgroundMigrationWorker", "default", "queues_shard_01"]
- ["*", "default"]
- Update
config/redis.yml
➜ gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ cat config/redis.yml
---
development:
queues_shard_01:
url: "redis://localhost:6378"
- Create a dummy feature flag config file
➜ gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ cat config/feature_flags/ops/sidekiq_route_to_queues_shard_01.yml
---
name: sidekiq_route_to_queues_shard_01
feature_issue_url:
introduced_by_url:
rollout_issue_url:
milestone: '16.9'
group: group::scalability
type: ops
default_enabled: false
- Apply this diff to allow dummy jobs to pass gracefully (optional)
diff --git a/lib/gitlab/background_migration/job_coordinator.rb b/lib/gitlab/background_migration/job_coordinator.rb
index 09e2b2a32197..09c55624f93b 100644
--- a/lib/gitlab/background_migration/job_coordinator.rb
+++ b/lib/gitlab/background_migration/job_coordinator.rb
@@ -88,6 +88,7 @@ def steal(steal_class, retry_dead_jobs: false)
begin
perform(migration_class, migration_args) if job.delete
+ puts "performed"
rescue Exception # rubocop:disable Lint/RescueException
worker_class # enqueue this migration again
.perform_async(migration_class, migration_args)
@@ -101,7 +102,7 @@ def steal(steal_class, retry_dead_jobs: false)
def perform(class_name, arguments)
with_shared_connection do
- migration_instance_for(class_name).perform(*arguments)
+ # migration_instance_for(class_name).perform(*arguments)
end
Testing
- Open a
gdk rails console
Feature.enable(:enable_sidekiq_shard_router)
Feature.enable(:sidekiq_route_to_queues_shard_01)
- Schedule a job to steal
Loading development environment (Rails 7.0.8.1)
[1] pry(main)> BackgroundMigrationWorker.perform_in(1.hour,'Foo', 'hello')
=> "24d63d35571d23d59f9d1bb4"
- Verify in redis
➜ gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ redis-cli -p 6378 zcard schedule
(integer) 1
- Steal the job. The jobs will be fetched from the new Redis instead of gdk's as the job coordinator is shard-aware.
[1] pry(main)> coor = Gitlab::BackgroundMigration::JobCoordinator.for_tracking_database('main')
=> #<Gitlab::BackgroundMigration::JobCoordinator:0x0000000164932eb0 @worker_class=BackgroundMigrationWorker>
[2] pry(main)> out = coor.steal('Foo')
performed
=> [#<Sidekiq::ScheduledSet:0x000000015fa1fc38 @_size=0, @name="schedule">,
#<Sidekiq::Queue:0x000000015f9f76c0 @name="default", @rname="queue:default">]
- Verify that the scheduled job is stolen
➜ gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ redis-cli -p 6378 zcard schedule
(integer) 0
Edited by Sylvester Chin