Reverse defer_sidekiq_jobs FF to run_sidekiq_jobs
Resolves gitlab-com/gl-infra/scalability#2381 (closed)
What does this MR do and why?
Reverse defer_sidekiq_jobs FF to run_sidekiq_jobs
The FF for defer_sidekiq_jobs is disabled by default. During an
incident where we want to start deferring jobs, we'd set it to true
.
Now, if we were to resume running the worker, we wouldn't be able to
set the FF with percentage of time rollout. This is blocked by the
Flipper gem implementation where we couldn't set percentage of time from
a fully enabled feature flag.
Changes:
- Reverses the FF convention so that run_sidekiq_jobs FF is enabled by default, disabled when we want jobs being deferred.
- We can enable percentage of time from a disabled FF like a normal FF usage. For example, 10% of enabled
run_sidekiq_jobs
FF will actually run 10% of the jobs and defers 90% of the rest.
How to set up and validate locally
- Disable the feature flag in Rails console:
Feature.disable(:"run_sidekiq_jobs_Chaos::SleepWorker")
- Wait for ~1 minute for thread local cache FF to expire
- Run jobs from Rails console:
Chaos::SleepWorker.perform_async(2)
- Check Sidekiq logs
gdk tail rails-background-jobs
, it should include"job_status": "deferred"
:
Click to expand
2023-06-13_12:43:04.40825 rails-background-jobs :
{
"severity": "INFO",
"time": "2023-06-13T12:43:04.408Z",
"retry": 3,
"queue": "default",
"backtrace": true,
"version": 0,
"queue_namespace": "chaos",
"args":
[
"2"
],
"class": "Chaos::SleepWorker",
"jid": "4f591be2e6010d96d7d2781e",
"created_at": "2023-06-13T12:43:04.369Z",
"correlation_id": "bc8d5393a4db8bbcec46b47c0f0f0b38",
"worker_data_consistency": "always",
"idempotency_key": "resque:gitlab:duplicate:default:24808edb6838c48efcad29c8e4b7b5b1c8243aa7a545bb43a333e8284b18e49e",
"size_limiter": "validated",
"enqueued_at": "2023-06-13T12:43:04.401Z",
"job_size_bytes": 3,
"pid": 36284,
"message": "Chaos::SleepWorker JID-4f591be2e6010d96d7d2781e: deferred: 0.006011 sec",
"job_status": "deferred",
"scheduling_latency_s": 0.000515,
"redis_calls": 3,
"redis_duration_s": 0.000301,
"redis_read_bytes": 3,
"redis_write_bytes": 682,
"redis_queues_calls": 3,
"redis_queues_duration_s": 0.000301,
"redis_queues_read_bytes": 3,
"redis_queues_write_bytes": 682,
"db_count": 0,
"db_write_count": 0,
"db_cached_count": 0,
"db_replica_count": 0,
"db_primary_count": 0,
"db_main_count": 0,
"db_ci_count": 0,
"db_main_replica_count": 0,
"db_ci_replica_count": 0,
"db_replica_cached_count": 0,
"db_primary_cached_count": 0,
"db_main_cached_count": 0,
"db_ci_cached_count": 0,
"db_main_replica_cached_count": 0,
"db_ci_replica_cached_count": 0,
"db_replica_wal_count": 0,
"db_primary_wal_count": 0,
"db_main_wal_count": 0,
"db_ci_wal_count": 0,
"db_main_replica_wal_count": 0,
"db_ci_replica_wal_count": 0,
"db_replica_wal_cached_count": 0,
"db_primary_wal_cached_count": 0,
"db_main_wal_cached_count": 0,
"db_ci_wal_cached_count": 0,
"db_main_replica_wal_cached_count": 0,
"db_ci_replica_wal_cached_count": 0,
"db_replica_duration_s": 0.0,
"db_primary_duration_s": 0.0,
"db_main_duration_s": 0.0,
"db_ci_duration_s": 0.0,
"db_main_replica_duration_s": 0.0,
"db_ci_replica_duration_s": 0.0,
"cpu_s": 0.00355,
"worker_id": "sidekiq_0",
"rate_limiting_gates":
[],
"duration_s": 0.006011,
"completed_at": "2023-06-13T12:43:04.408Z",
"load_balancing_strategy": "primary",
"job_deferred_by": "feature_flag",
"db_duration_s": 0.0
}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Gregorius Marco