Skip to content

Add Sidekiq execution SLI as apdex

Gregorius Marco requested to merge add-sidekiq-sli-metrics into master

What does this MR do and why?

This MR adds Sidekiq execution metrics as apdex and error rate counters, which will then be aggregated as application SLIs as described in https://docs.gitlab.com/ee/development/application_slis/.

Resolves Add Sidekiq execution Application SLIs to the R... (gitlab-com/gl-infra/scalability#1638 - closed) (check the issue for detailed requirement of the metrics)

How to set up and validate locally

  1. Run gdk

  2. Check that gitlab_sli_sidekiq_execution_* metrics are present:

    ❯ curl 'gdk.test:3807/metrics' -s | grep gitlab_sli_sidekiq_execution_
    ...
  3. Spawn a gdk rails c console

  4. Push a job to any worker, eg Chaos::SleepWorker.perform_async(1)

  5. The counters will get incremented:

    ❯ curl 'gdk.test:3807/metrics' -s | grep gitlab_sli | grep SleepWorker
    gitlab_sli_sidekiq_execution_apdex_success_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
    gitlab_sli_sidekiq_execution_apdex_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
    gitlab_sli_sidekiq_execution_error_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 0
    gitlab_sli_sidekiq_execution_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
  6. To emulate an error, run Chaos::SleepWorker.perform_async(1, 2, 3)

  7. The error counter will get incremented:

    ❯ curl 'gdk.test:3807/metrics' -s | grep gitlab_sli | grep SleepWorker
    gitlab_sli_sidekiq_execution_apdex_success_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
    gitlab_sli_sidekiq_execution_apdex_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
    gitlab_sli_sidekiq_execution_error_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 1
    gitlab_sli_sidekiq_execution_total{feature_category="not_owned",urgency="low",worker="Chaos::SleepWorker"} 2
  8. To test apdex failure, run Chaos::SleepWorker.perform_async(301). This will exceed the requirement of 300s.

  9. gitlab_sli_sidekiq_execution_apdex_total will get incremented without gitlab_sli_sidekiq_execution_apdex_success_total incremented.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Gregorius Marco

Merge request reports

Loading