Track worker concurrency using Redis hash
What does this MR do and why?
This MR implements an alternative approach to track worker concurrency.
1. Why do we care about counting the concurrency?
The concurrency limiter depends on the concurrency tally to schedule workers. Any inaccuracies would result in either:
- worker count < reality: too many jobs being scheduled
- worker count > reality: jobs buffer being cleared too slowly
2. How does it work today? Both at a high level (e.g loop over all sidekiq processes then loop over each thread and count a list or something) and a low level (the specific redis calls)
We currently use the Sidekiq WorkSet
api which loops over all registered Sidekiq processes, reads all thread's running jobs and tallies a frequency count of each worker. This is also cpu-intensive as it marshals JSON into a hash in sidekiq and in gitlab rails.
In Redis terms, we periodically perform <nbr_process>/<batch_size>
sscan
and <nbr_process>
hgetall per Sidekiq Redis.
The state read through the WorkSet
api is updated every Sidekiq process heartbeat which is configured to a 10s interval. This means the snapshot we are using does not change for 10s. Given the speed of job processing, 10s is a fairly long window.
3. How this MR changes things
The application track worker counts using a Redis hash per Sidekiq worker. The hash fields are contain information of the Sidekiq process id and thread id. The application also does a periodic hash clean up during ConcurrencyLimit::ResumeWorker
crons.
Instead of checking a cached concurrency tally to decide a worker should be queued, the process can perform a hlen
over the worker-specific key to get the latest count. https://redis.io/docs/latest/commands/hlen/ is considered a @fast
@read
command.
See #490936 (comment 2120978564)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.