Add FF to stop emitting Sidekiq histogram metrics
What does this MR do and why?
When the ops FF emit_sidekiq_histogram_metrics
is disabled (default enabled), Sidekiq stops emitting these metrics:
sidekiq_jobs_completion_seconds
sidekiq_jobs_queue_duration_seconds
sidekiq_jobs_failed_total
sidekiq_jobs_completion_seconds_sum
as a counter will be emitted when the FF is disabled. This sum counter is still used in dashboards.gitlab.net. Context: gitlab-com/runbooks!6096 (comment 1498347517)
The sidekiq_jobs_completion_seconds_sum
counter is emitted only when the FF is disabled because having the same metric name from both histogram sidekiq_jobs_completion_seconds
(which will implicitly produce the _sum
counter) and the raw counter sidekiq_jobs_completion_seconds_sum
could crash the Sidekiq application silently (tested locally).
This change is only meant for GitLab.com for now as self-managed might still use these histograms.
Part of an effort to remove some metrics emitted by Sidekiq gitlab-com/gl-infra/scalability#2297 (closed)
How to set up and validate locally
- Ensure
sidekiq_exporter
enabled ingdk.yml
:
gitlab:
rails_background_jobs:
sidekiq_exporter_enabled: true
- With the FF still enabled, we can still see the bucket metrics:
❯ curl -s 'gdk.test:3807/metrics' | rg sidekiq_jobs_completion_seconds_bucket | head
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="+Inf",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="+Inf",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="10",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="10",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="300",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="300",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="+Inf",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="+Inf",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="10",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="10",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
- Disable the FF in Rails console
Feature.disable(:emit_sidekiq_histogram_metrics)
- Restart sidekiq
gdk restart rails-background-jobs
- Check only the
sidekiq_jobs_completion_seconds_sum
exists:
❯ curl -s 'gdk.test:3807/metrics' | rg sidekiq_jobs_completion_seconds | head
# HELP sidekiq_jobs_completion_seconds_sum Multiprocess metric
# TYPE sidekiq_jobs_completion_seconds_sum counter
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="build_artifacts",queue="default",urgency="low",worker="Projects::RefreshBuildArtifactsSizeStatisticsWorker"} 0.04362599999876693
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="build_artifacts",queue="default",urgency="low",worker="Projects::ScheduleRefreshBuildArtifactsSizeStatisticsWorker"} 0.28387600000132807
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="cell",queue="default",urgency="low",worker="LooseForeignKeys::CleanupWorker"} 0.391459999998915
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="code_review_workflow",queue="default",urgency="low",worker="ScheduleMergeRequestCleanupRefsWorker"} 0.032480000001669396
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="database",queue="default",urgency="low",worker="Database::BatchedBackgroundMigration::CiDatabaseWorker"} 0.43211000000155764
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="database",queue="default",urgency="low",worker="Database::BatchedBackgroundMigrationWorker"} 0.2683130000004894
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="gitaly",queue="default",urgency="low",worker="BatchedGitRefUpdates::CleanupSchedulerWorker"} 0.020981000001484063
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="global_search",queue="default",urgency="low",worker="ElasticIndexInitialBulkCronWorker"} 0.00874900000053458
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.