Run SidekiqExporter only on first worker
What does this MR do and why?
When using the SidekiqExporter
via gitlab.monitoring.sidekiq_exporter.enable = true
and running more than 1 worker in sidekiq-cluster, there is a race condition where all workers try to bind to a port to serve metrics and health-checks. We are currently using a workaround that let's all N-1 workers fail into a rescue clause when failing to allocate that port.
We can address this problem by letting sidekiq-cluster elect a "leader" of all workers, for instance sidekiq_0
(the first worker launched), which will take sole responsibility of the above. All other workers should not attempt to bind ports, serve metrics, or do anything of the sort.
In environments where only 1 worker is used, that worker will lead implicitly.
This makes for a more predictable environment where multiple sidekiq workers are present.
Implementation
I went with the simplest thing I could think of:
- When running a single worker via
bundle exec
, this worker will be exporting metrics - When running > 1 worker via
sidekiq-cluster
, onlysidekiq_0
will export metrics
This required the least amount of machinery because we already pass around worker IDs through the environment. If there is no worker ID, we know we're not operating in a cluster of processes.
I decided to not put this behind a feature flag because this has caused problems in the past when checked in initializers that run early in the initializer chain (such as 7_prometheus
). I think the change is pretty simple and safe.
How to set up and validate locally
Scenario 1: Single worker, no sidekiq-cluster
- Run
bundle exec sidekiq
- curl
localhost:<metrics_port>/metrics
-- it should serve metrics
Scenario 2: Multiple worker via sidekiq-cluster
- Run
bin/background_jobs
- curl
localhost:<metrics_port>/metrics
-- it should serve metrics
No worker should ever fail with "can't bind <metrics_port> - already allocated"
Note that <metrics_port>
depends on your dev env and local settings. For me using the GCK it is 3807
by default.
Test in review app
Find the sidekiq pod, Exec
into it using -it -- bash
Run
git@review-345794-sid-mlslqy-sidekiq-all-in-1-v1-75b9b47b6-pz6m2:/$ curl -s localhost:3807/metrics | wc -l
27178
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #345794 (closed)