Implement Gitlab mirror scheduling tracker to make UpdateAllMirrorsWorker independent from Sidekiq queue sizes
What does this MR do and why?
For #340630 (closed)
At the time of writing, the UpdateAllMirrorsWorker
uses the queue size of the ProjectImportScheduleWorker
before possibly rescheduling itself. This dependency prevents us from moving to queue-per-shard completely. As a part of an initiative to deprecate queue selector, this MR is to make UpdateAllMirrorsWorker not depend on the queue size of the other worker.
We already made two attempts to fix this issue:
- !79097 (merged). This MR failed while rolling out the feature flag. The detailed debugging logs can be found in the second MR's description.
- !80711 (closed). This MR was to fix the above issue. However, the usage of JobTracker here doesn't make much sense. It's out of scope of LimitedCapacity::JobTracker.
This MR is the third attempt, with a new approach: implement a dedicated scheduling tracker for mirroring. This tracker tracks the projects when they start their scheduling, and untracks when the status of their import states are transitioned to scheduled
. When looking at ProjectImportState's state machine, scheduled
is the mandatory state when a project starts its mirroring pipeline. Putting project untracking before that state transition ensures the correctness of scheduling counter.
flowchart LR
none
scheduled
started
finished
failed
none-- schedule -->scheduled
finished-- schedule -->scheduled
failed-- schedule -->scheduled
none-- force_start -->started
none-- force_start --> started
finished-- force_start --> started
failed-- force_start --> started
scheduled-- start --> started
started-- start --> finished
scheduled-- fail_op --> failed
started-- fail_op --> failed
This MR also modifies the set of feature flags:
- Remove
project_import_schedule_job_tracker
flag (introduced in !79097 (merged)). This flag was never turned on by default. - Remove
update_all_mirrors_job_tracker
flag (introduced in a !79097 (merged)). This flag was never turned on by default. - Introduce
mirror_scheduling_tracking
flag
Screenshots or screen recordings
N/A. This change should not affect any users.
How to set up and validate locally
- Start one Rails console session (Console A):
loop do
puts "current_scheduling: #{Gitlab::Mirror.current_scheduling}"
puts "queue_size: #{ProjectImportScheduleWorker.queue_size}"
sleep 0.5
end
- Start another Rails console session (Console B). Run the following command to start an UpdateAllMirrorsWorker:
UpdateAllMirrorsWorker.new.perform_async
- Console A prints out the following lines, indicates that
Gitlab::Mirror.current_scheduling
and actual queue size are very close, some seconds apart. Looking at the logs, UpdateAllMirrorsWorker job is blocked until the number drops to 0.
job_tracker: 19
queue_size: 19
job_tracker: 19 <=== Lagged behind
queue_size: 12
job_tracker: 19
queue_size: 12
job_tracker: 18
queue_size: 12
job_tracker: 14
queue_size: 12 <=== In-synced, 2 seconds later
job_tracker: 12
queue_size: 12
job_tracker: 12
queue_size: 10
job_tracker: 10
queue_size: 9
job_tracker: 9 <=== Lagged behind
queue_size: 5
job_tracker: 8
queue_size: 3
job_tracker: 5
queue_size: 3
job_tracker: 3 <=== In-synced, 1.5 seconds later
queue_size: 3
job_tracker: 3
queue_size: 3
job_tracker: 3
queue_size: 3
job_tracker: 2
queue_size: 2
job_tracker: 2
queue_size: 2
job_tracker: 1
queue_size: 1
job_tracker: 1
queue_size: 1
job_tracker: 1
queue_size: 0
job_tracker: 0
job_tracker: 19 <==== At this point, UpdateAllMirrorsWorker is rescheduled
queue_size: 19
...
- Look at Sidekiq admin dashboard and logs to see UpdateAllMirrorsWorker is rescheduled
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.