Skip to content

Initialise sidekiq_jobs_completion_seconds before running jobs

Sean McGivern requested to merge initialise-sidekiq-jobs-completion-metrics into master

When a worker runs for the first time, it will have sidekiq_jobs_completion_seconds_count set to 1 (because it was just completed). But this is wrong in Prometheus terms, because there was no previous value of 0, and so the rate of change will also be 0.

To work around this, we initialise this bucket for all workers that can be run by the current Sidekiq process before processing the first job. This way we can calculate rates correctly for infrequent workers.

Testing

We need Prometheus in the GDK (which has some issues: gitlab-development-kit#1061). Then we can run the GDK except rails-background-jobs, instead starting that manually with:

bin/sidekiq-cluster --queue-selector 'feature_category=source_code_management'

We can then force the TrendingProjectsWorker - which normally runs once per day - to run:

bundle exec rails r 'TrendingProjectsWorker.perform_async'

With the feature flag in this MR disabled (sidekiq_job_completion_metric_initialize), we'll see this in our Prometheus metrics. Note how it goes straight to 1 without passing through 0:

image

With the feature flag enabled (I also deleted my Prometheus data volume to be sure this change worked):

image

Also note that the metrics are initialised for all other groupsource code workers, but not all workers that are possible.

For gitlab-com/gl-infra/scalability#1133 (closed).

Edited by Sean McGivern

Merge request reports

Loading