Skip to content

Track worker concurrency using Redis hash

Sylvester Chin requested to merge sc1-optimise-concurrency-tracking into master

What does this MR do and why?

This MR implements an alternative approach to track worker concurrency.

1. Why do we care about counting the concurrency?

The concurrency limiter depends on the concurrency tally to schedule workers. Any inaccuracies would result in either:

  • worker count < reality: too many jobs being scheduled
  • worker count > reality: jobs buffer being cleared too slowly

2. How does it work today? Both at a high level (e.g loop over all sidekiq processes then loop over each thread and count a list or something) and a low level (the specific redis calls)

We currently use the Sidekiq WorkSet api which loops over all registered Sidekiq processes, reads all thread's running jobs and tallies a frequency count of each worker. This is also cpu-intensive as it marshals JSON into a hash in sidekiq and in gitlab rails.

In Redis terms, we periodically perform <nbr_process>/<batch_size> sscan and <nbr_process> hgetall per Sidekiq Redis.

The state read through the WorkSet api is updated every Sidekiq process heartbeat which is configured to a 10s interval. This means the snapshot we are using does not change for 10s. Given the speed of job processing, 10s is a fairly long window.

3. How this MR changes things

The application track worker counts using a Redis hash per Sidekiq worker. The hash fields are contain information of the Sidekiq process id and thread id. The application also does a periodic hash clean up during ConcurrencyLimit::ResumeWorker crons.

Instead of checking a cached concurrency tally to decide a worker should be queued, the process can perform a hlen over the worker-specific key to get the latest count. https://redis.io/docs/latest/commands/hlen/ is considered a @fast @read command.

See #490936 (comment 2120978564)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Sylvester Chin

Merge request reports

Loading