Use active sidekiq router's queues for sidekiq/queue_metrics API
What does this MR do and why?
As discussed in an internal slack thread, there is one problem with GET api/v4/sidekiq/queue_metrics
endpoint. That endpoint is supposed to return the list of Sidekiq queues and corresponding backlog and latency numbers. Previously, when we follow queue-per-worker, that endpoint returns a curated list of around 500 queues. After we migrated to use queue-per-shard and queue routing rules, the number of queues drops significantly, down to a handful of queues. Unfortunately, we still maintain a list of queue names generated from worker names. That list is persisted in Redis and can be accessed with Sidekiq::Queue API. The redundant queues can only be removed after gitlab-com/gl-infra&596 (closed) is done.
This MR makes that endpoint return the data for active routing queues only. The list of queues is now generated by pushing the list of workers to global Sidekiq router.
How to set up and validate locally
-
Apply the production routing rules to local environment
-
Issue
curl --header "PRIVATE-TOKEN: $TOKEN" "http://localhost:3000/api/v4/sidekiq/queue_metrics"
command against the local web server. The results are different before and after the change is made. -
Before
{
"queues": {
"adjourned_project_deletion": {
"backlog": 0,
"latency": 0
},
"admin_emails": {
"backlog": 0,
"latency": 0
},
"analytics_code_review_metrics": {
"backlog": 0,
"latency": 0
},
"analytics_devops_adoption_create_snapshot": {
"backlog": 0,
"latency": 0
},
"analytics_usage_trends_counter_job": {
"backlog": 0,
"latency": 0
},
... 500+ more
}
- After
{
"queues": {
"database_throttled": {
"backlog": 0,
"latency": 0
},
"default": {
"backlog": 0,
"latency": 0
},
"elasticsearch": {
"backlog": 177,
"latency": 2070042
},
"email_receiver": {
"backlog": 0,
"latency": 0
},
"gitaly_throttled": {
"backlog": 0,
"latency": 0
},
"imports": {
"backlog": 0,
"latency": 0
},
"low_urgency_cpu_bound": {
"backlog": 108,
"latency": 2070042
},
"mailers": {
"backlog": 0,
"latency": 0
},
"memory_bound": {
"backlog": 0,
"latency": 0
},
"quarantine": {
"backlog": 0,
"latency": 0
},
"service_desk_email_receiver": {
"backlog": 0,
"latency": 0
},
"urgent_cpu_bound": {
"backlog": 0,
"latency": 0
},
"urgent_other": {
"backlog": 3,
"latency": 1564815
}
}
}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.