Skip to content

Reverse defer_sidekiq_jobs FF to run_sidekiq_jobs

Gregorius Marco requested to merge mg-reverse-defer-jobs-ff into master

Resolves gitlab-com/gl-infra/scalability#2381 (closed)

What does this MR do and why?

Reverse defer_sidekiq_jobs FF to run_sidekiq_jobs

The FF for defer_sidekiq_jobs is disabled by default. During an incident where we want to start deferring jobs, we'd set it to true. Now, if we were to resume running the worker, we wouldn't be able to set the FF with percentage of time rollout. This is blocked by the Flipper gem implementation where we couldn't set percentage of time from a fully enabled feature flag.

Changes:

  • Reverses the FF convention so that run_sidekiq_jobs FF is enabled by default, disabled when we want jobs being deferred.
  • We can enable percentage of time from a disabled FF like a normal FF usage. For example, 10% of enabled run_sidekiq_jobs FF will actually run 10% of the jobs and defers 90% of the rest.

How to set up and validate locally

  1. Disable the feature flag in Rails console:
Feature.disable(:"run_sidekiq_jobs_Chaos::SleepWorker")
  1. Wait for ~1 minute for thread local cache FF to expire
  2. Run jobs from Rails console:
Chaos::SleepWorker.perform_async(2)
  1. Check Sidekiq logs gdk tail rails-background-jobs, it should include "job_status": "deferred":
Click to expand
2023-06-13_12:43:04.40825 rails-background-jobs : 
{
    "severity": "INFO",
    "time": "2023-06-13T12:43:04.408Z",
    "retry": 3,
    "queue": "default",
    "backtrace": true,
    "version": 0,
    "queue_namespace": "chaos",
    "args":
    [
        "2"
    ],
    "class": "Chaos::SleepWorker",
    "jid": "4f591be2e6010d96d7d2781e",
    "created_at": "2023-06-13T12:43:04.369Z",
    "correlation_id": "bc8d5393a4db8bbcec46b47c0f0f0b38",
    "worker_data_consistency": "always",
    "idempotency_key": "resque:gitlab:duplicate:default:24808edb6838c48efcad29c8e4b7b5b1c8243aa7a545bb43a333e8284b18e49e",
    "size_limiter": "validated",
    "enqueued_at": "2023-06-13T12:43:04.401Z",
    "job_size_bytes": 3,
    "pid": 36284,
    "message": "Chaos::SleepWorker JID-4f591be2e6010d96d7d2781e: deferred: 0.006011 sec",
    "job_status": "deferred",
    "scheduling_latency_s": 0.000515,
    "redis_calls": 3,
    "redis_duration_s": 0.000301,
    "redis_read_bytes": 3,
    "redis_write_bytes": 682,
    "redis_queues_calls": 3,
    "redis_queues_duration_s": 0.000301,
    "redis_queues_read_bytes": 3,
    "redis_queues_write_bytes": 682,
    "db_count": 0,
    "db_write_count": 0,
    "db_cached_count": 0,
    "db_replica_count": 0,
    "db_primary_count": 0,
    "db_main_count": 0,
    "db_ci_count": 0,
    "db_main_replica_count": 0,
    "db_ci_replica_count": 0,
    "db_replica_cached_count": 0,
    "db_primary_cached_count": 0,
    "db_main_cached_count": 0,
    "db_ci_cached_count": 0,
    "db_main_replica_cached_count": 0,
    "db_ci_replica_cached_count": 0,
    "db_replica_wal_count": 0,
    "db_primary_wal_count": 0,
    "db_main_wal_count": 0,
    "db_ci_wal_count": 0,
    "db_main_replica_wal_count": 0,
    "db_ci_replica_wal_count": 0,
    "db_replica_wal_cached_count": 0,
    "db_primary_wal_cached_count": 0,
    "db_main_wal_cached_count": 0,
    "db_ci_wal_cached_count": 0,
    "db_main_replica_wal_cached_count": 0,
    "db_ci_replica_wal_cached_count": 0,
    "db_replica_duration_s": 0.0,
    "db_primary_duration_s": 0.0,
    "db_main_duration_s": 0.0,
    "db_ci_duration_s": 0.0,
    "db_main_replica_duration_s": 0.0,
    "db_ci_replica_duration_s": 0.0,
    "cpu_s": 0.00355,
    "worker_id": "sidekiq_0",
    "rate_limiting_gates":
    [],
    "duration_s": 0.006011,
    "completed_at": "2023-06-13T12:43:04.408Z",
    "load_balancing_strategy": "primary",
    "job_deferred_by": "feature_flag",
    "db_duration_s": 0.0
}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Gregorius Marco

Merge request reports

Loading