Defer sidekiq jobs based on database health status indicators
What does this MR do and why?
This is part III of #404898 (closed). !121261 (merged) exposed variables needed to define the health check context, this MR performs the database health check and defers (re-queues) the sidekiq job if database signals to stop.
Approach:
We extend the existing Sidekiq DeferJobs middleware to perform a check on database health status as well.
Feature flag:
This feature is behind a FF (defer_sidekiq_workers_on_database_health_signal), rollout issue: #412990 (closed)
P.S: The initial effort was done in MR#119187 which has the entire changeset and became huge, so split it into smaller MRs to have more control.
How to set up and validate locally
- Enable 'defer_sidekiq_workers_on_database_health_signal' FF from the rails console,
Feature.enable(:defer_sidekiq_workers_on_database_health_signal)
. - Create a new worker or choose an existing worker, for this I am using 'Chaos::SleepWorker'.
- Let's set 'database_health_check_attrs' for the worker, eg:
# frozen_string_literal: true module Chaos class SleepWorker # rubocop:disable Scalability/IdempotentWorker ... ... defer_on_database_health_signal :gitlab_main, 1.minute, [:users] def perform(duration_s) Gitlab::Chaos.sleep(duration_s) end end end
-
Chaos::SleepWorker.defer_on_database_health_signal?
should be returning positive now. - Make defer_job_by_database_health_signal? to return
true
locally (so that we don't have to prepare for actual db health status evaluation) -
reload!
the local rails console if you are using the same console. - tail jobs logs,
gdk tail rails-background-jobs | grep Chaos::SleepWorker
- If needed please restart
gdk restart rails-background-jobs
locally
- If needed please restart
- performing the job, should schedule it after a minute instead of executing immediately.
pry(main)> Chaos::SleepWorker.perform_async(1) => "9e402fc977d7bcf32597fe91" pry(main)> queue = Sidekiq::ScheduledSet.new pry(main)> queue.map { |job| job } => [#<Sidekiq::SortedEntry:0x00000001495b7788 @args=nil, @item= {"retry"=>3, "queue"=>"default", "backtrace"=>true, "version"=>0, "queue_namespace"=>"chaos", "class"=>"Chaos::SleepWorker", "args"=>[1], "jid"=>"d51303502cb5f6849488961b", "created_at"=>1685034663.062152, "correlation_id"=>"b8b450db286639352dd5195d6c85ce13", "meta.caller_id"=>"Chaos::SleepWorker", "meta.feature_category"=>"not_owned", "meta.root_caller_id"=>"Chaos::SleepWorker", "worker_data_consistency"=>"always", "size_limiter"=>"validated", "scheduled_at"=>1685034723.062093}, @parent=#<Sidekiq::ScheduledSet:0x000000014950ccc0 @_size=1, @name="schedule">, @queue="default", @score=1685034723.062093, @value= "{\"retry\":3,\"queue\":\"default\",\"backtrace\":true,\"version\":0,\"queue_namespace\":\"chaos\",\"class\":\"Chaos::SleepWorker\",\"args\":[1],\"jid\":\"d51303502cb5f6849488961b\",\"created_at\":1685034663.062152,\"correlation_id\":\"b8b450db286639352dd5195d6c85ce13\",\"meta.caller_id\":\"Chaos::SleepWorker\",\"meta.feature_category\":\"not_owned\",\"meta.root_caller_id\":\"Chaos::SleepWorker\",\"worker_data_consistency\":\"always\",\"size_limiter\":\"validated\",\"scheduled_at\":1685034723.062093}">]
- Note the 'jid' for later, and 'scheduled_at' is after a minute from 'created_at'
- After a minute, we should be able to see logs coming in (7) - from rails-background-jobs
- On executing the job now, it will again re-queue (because of (5)) but with different 'jid' than the previous one.
pry(main)> queue = Sidekiq::ScheduledSet.new pry(main)> queue.map { |job| job } # We should be able to see another job scheduled after a minute of prev job execution, with new 'jid'
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #404898 (closed)