Skip to content

Defer Sidekiq jobs from worker type feature flags

Gregorius Marco requested to merge mg-defer-sidekiq-jobs into master

What does this MR do and why?

Resolves gitlab-com/gl-infra/scalability#2346 (closed)

Check out gitlab-com/gl-infra&1004 (closed) for more context


The new server middleware will check against feature flag of worker type defer_sidekiq_jobs:<worker_name> to determine whether the job should be deferred.

Changelog: added

Note: This MR is based off !120590 (merged)

How to set up and validate locally

  1. Change the DELAY constant in lib/gitlab/sidekiq_middleware/defer_jobs.rb to 1 minute (or shorter) so we don't have to wait for 5 minutes.

  2. Restart sidekiq

    $ gdk restart rails-background-jobs
    ok: down: /Users/gregoriusmarco/Documents/workspace/gdk-10-22/services/rails-background-jobs: 0s
    ok: run: /Users/gregoriusmarco/Documents/workspace/gdk-10-22/services/rails-background-jobs: (pid 73634) 1s, normally down
  3. In Rails console, start a job:

    [6] pry(main)> Chaos::SleepWorker.perform_async(1)
    => "51b6c3e837b643eaddb442a7"
    [7] pry(main)> Chaos::SleepWorker.queue
    => "default"
  4. Check in Redis that the job was performed immediately:

    redis /Users/gregoriusmarco/Documents/workspace/gdk-10-22/redis/redis.socket[1]> llen resque:gitlab:queue:default
    (integer) 0
  5. Turn on the feature flag defer_sidekiq_jobs:Chaos::SleepWorker via API (which requires to setup a PAT), or directly inserting to feature_gates table in DB:

    $ curl -H "PRIVATE-TOKEN: $PERSONAL_ACCESS_TOKEN" http://gdk.test:3000/api/v4/features/defer_sidekiq_jobs:Chaos::SleepWorker --data "value=true" | jq
    {
      "name": "defer_sidekiq_jobs:Chaos::SleepWorker",
      "state": "on",
      "gates": [
        {
          "key": "boolean",
          "value": true
        }
      ],
      "definition": null
    }
  6. Try running the job again:

    # Clear the current ScheduledSet jobs
    [11] pry(main)> Sidekiq::ScheduledSet.new.clear
    => true
    [9] pry(main)> Chaos::SleepWorker.perform_async(1)
    => "d8142280ffcb4cb0ba55ed62"
  7. Check for the scheduled job in Redis. Check the scheduled_at will be in 1 minute. (The jid will be different in this case, as the middleware is effectively enqueueing a new job). After another minute, the scheduled_at will be updated to the subsequent minute.

    redis /Users/gregoriusmarco/Documents/workspace/gdk-10-22/redis/redis.socket[1]> zrevrange resque:gitlab:schedule 0 -1
    1) "{\"retry\":3,\"queue\":\"default\",\"backtrace\":true,\"version\":0,\"queue_namespace\":\"chaos\",\"class\":\"Chaos::SleepWorker\",\"args\":[1],\"jid\":\"d40984ff09e83eae15ebdd53\",\"created_at\":1684149507.791332,\"correlation_id\":\"a722a2114851f18f7d501e83eab9742a\",\"meta.caller_id\":\"Chaos::SleepWorker\",\"meta.feature_category\":\"not_owned\",\"meta.root_caller_id\":\"Chaos::SleepWorker\",\"worker_data_consistency\":\"always\",\"size_limiter\":\"validated\",\"scheduled_at\":1684149567.791281}"
  8. Turn off the feature flag:

    $ curl -H "PRIVATE-TOKEN: $PERSONAL_ACCESS_TOKEN" http://gdk.test:3000/api/v4/features/defer_sidekiq_jobs:Chaos::SleepWorker --data "value=false" | jq
    {
      "name": "defer_sidekiq_jobs:Chaos::SleepWorker",
      "state": "off",
      "gates": [
        {
          "key": "boolean",
          "value": false
        }
      ],
      "definition": null
    }
  9. Wait for the feature flag's thread local cache to expire (should be within a minute), then check the ScheduledSet in redis again which should be empty:

    redis /Users/gregoriusmarco/Documents/workspace/gdk-10-22/redis/redis.socket[1]> zrevrange resque:gitlab:schedule 0 -1
    (empty array)

To test percentage of time

  1. Turn on the feature flag with integer value:

    $ curl -H "PRIVATE-TOKEN: $PERSONAL_ACCESS_TOKEN" http://gdk.test:3000/api/v4/features/defer_sidekiq_jobs:Chaos::SleepWorker --data "value=10" | jq
    {
      "name": "defer_sidekiq_jobs:Chaos::SleepWorker",
      "state": "conditional",
      "gates": [
        {
          "key": "boolean",
          "value": false
        },
        {
          "key": "percentage_of_time",
          "value": 10
        }
      ],
      "definition": null
    }
  2. Check using Feature.enabled?:

    [15] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [16] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [17] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [18] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => true
    [19] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [20] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => true
    [21] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [22] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [23] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false
    [24] pry(main)> Feature.enabled?(:"defer_sidekiq_jobs:Chaos::SleepWorker", type: :worker, default_enabled_if_undefined: false)
    => false

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Gregorius Marco

Merge request reports

Loading