Allow migrating scheduled and to-be-retried Sidekiq jobs
We now allow Sidekiq worker routing to be configured by administrators. For example, they can say 'all jobs go to the default queue', or 'project export and import workers share a queue'. Right now, the only really useful case is to re-route jobs to the default queue, but we will support other options in future.
Migrating this sounds simple: listen to the old and new queues, update
the worker routing, wait for the old queue to be empty, and stop
listening to the old queue. But there's a catch: Sidekiq maintains two
sorted sets with jobs that are to be run in the future. There is the
scheduled set (for jobs that use perform_in
or perform_at
or
similar, where we choose to run a job in the future) and the retry
set (after failing, a job will get retried with some back-off).
Both of those sets are 'global' - there isn't one for each possible destination queue. That means that the set entries themselves contain information about their destination queue. And in the migration case above, the destination queue might be the old queue and no longer listened to.
This adds two Rake tasks (one for the retry set and one for the scheduled set) to allow administrators to rewrite the job data in those sorted sets
It uses these Redis commands:
- ZSCAN to iterate over the sets. This is O(1) per call, and provides useful guarantees about iterating over a set that may be changing as it's operated on.
- ZREM to remove the old job hash. This is O(log(N)) per call, where N is the number of elements in the set.
- ZADD to add the new job hash with the new queue name. This is also O(log(N)) per call.
ZREM and ZADD will each be called once per item to be migrated, so there may be many invocations of these commands during this task's run.
Testing
To test this, a simple way involving no local config changes is to run this in a console:
10000.times { |i| AuthorizedProjectsWorker.perform_in(i.minutes, 0) }
10000.times { |i| PostReceive.perform_in(i.minutes, 0) }
Gitlab::SidekiqMigrateJobs.new('schedule').execute('PostReceive' => 'default')
(You don't need to be running Sidekiq.)
And then run the task, which will migrate PostReceive
jobs back from the default
queue to the post_receive
queue:
$ bundle exec rake gitlab:sidekiq:migrate_jobs:schedule
I, [2021-05-10T19:25:41.330799 #64971] INFO -- : Processing schedule set. Estimated size: 19998.
I, [2021-05-10T19:25:41.424507 #64971] INFO -- : In progress. Scanned records: 1000. Migrated records: 485.
I, [2021-05-10T19:25:41.502785 #64971] INFO -- : In progress. Scanned records: 2000. Migrated records: 977.
I, [2021-05-10T19:25:41.612464 #64971] INFO -- : In progress. Scanned records: 3000. Migrated records: 1449.
I, [2021-05-10T19:25:41.694336 #64971] INFO -- : In progress. Scanned records: 4000. Migrated records: 1888.
I, [2021-05-10T19:25:41.842944 #64971] INFO -- : In progress. Scanned records: 5000. Migrated records: 2365.
I, [2021-05-10T19:25:42.017017 #64971] INFO -- : In progress. Scanned records: 6000. Migrated records: 2792.
I, [2021-05-10T19:25:42.229430 #64971] INFO -- : In progress. Scanned records: 7000. Migrated records: 3223.
I, [2021-05-10T19:25:42.352093 #64971] INFO -- : In progress. Scanned records: 8000. Migrated records: 3667.
I, [2021-05-10T19:25:42.429180 #64971] INFO -- : In progress. Scanned records: 9000. Migrated records: 4101.
I, [2021-05-10T19:25:42.505926 #64971] INFO -- : In progress. Scanned records: 10000. Migrated records: 4503.
I, [2021-05-10T19:25:42.592300 #64971] INFO -- : In progress. Scanned records: 11000. Migrated records: 4902.
I, [2021-05-10T19:25:42.662101 #64971] INFO -- : In progress. Scanned records: 12000. Migrated records: 5299.
I, [2021-05-10T19:25:42.734463 #64971] INFO -- : In progress. Scanned records: 13000. Migrated records: 5712.
I, [2021-05-10T19:25:42.822835 #64971] INFO -- : In progress. Scanned records: 14000. Migrated records: 6130.
I, [2021-05-10T19:25:42.971456 #64971] INFO -- : In progress. Scanned records: 15000. Migrated records: 6530.
I, [2021-05-10T19:25:43.034188 #64971] INFO -- : In progress. Scanned records: 16000. Migrated records: 6911.
I, [2021-05-10T19:25:43.099864 #64971] INFO -- : In progress. Scanned records: 17000. Migrated records: 7298.
I, [2021-05-10T19:25:43.177007 #64971] INFO -- : In progress. Scanned records: 18000. Migrated records: 7666.
I, [2021-05-10T19:25:43.243486 #64971] INFO -- : In progress. Scanned records: 19000. Migrated records: 8026.
I, [2021-05-10T19:25:43.305267 #64971] INFO -- : In progress. Scanned records: 20000. Migrated records: 8353.
I, [2021-05-10T19:25:43.365154 #64971] INFO -- : In progress. Scanned records: 21000. Migrated records: 8680.
I, [2021-05-10T19:25:43.432905 #64971] INFO -- : In progress. Scanned records: 22000. Migrated records: 9028.
I, [2021-05-10T19:25:43.509522 #64971] INFO -- : In progress. Scanned records: 23000. Migrated records: 9349.
I, [2021-05-10T19:25:43.614153 #64971] INFO -- : In progress. Scanned records: 24000. Migrated records: 9685.
I, [2021-05-10T19:25:43.679748 #64971] INFO -- : In progress. Scanned records: 25000. Migrated records: 9993.
I, [2021-05-10T19:25:43.681029 #64971] INFO -- : Done. Scanned records: 25025. Migrated records: 9999.