Skip to content

Use active sidekiq router's queues for sidekiq/queue_metrics API

What does this MR do and why?

As discussed in an internal slack thread, there is one problem with GET api/v4/sidekiq/queue_metrics endpoint. That endpoint is supposed to return the list of Sidekiq queues and corresponding backlog and latency numbers. Previously, when we follow queue-per-worker, that endpoint returns a curated list of around 500 queues. After we migrated to use queue-per-shard and queue routing rules, the number of queues drops significantly, down to a handful of queues. Unfortunately, we still maintain a list of queue names generated from worker names. That list is persisted in Redis and can be accessed with Sidekiq::Queue API. The redundant queues can only be removed after gitlab-com/gl-infra&596 (closed) is done.

This MR makes that endpoint return the data for active routing queues only. The list of queues is now generated by pushing the list of workers to global Sidekiq router.

How to set up and validate locally

  • Apply the production routing rules to local environment

  • Issue curl --header "PRIVATE-TOKEN: $TOKEN" "http://localhost:3000/api/v4/sidekiq/queue_metrics" command against the local web server. The results are different before and after the change is made.

  • Before

{
  "queues": {
    "adjourned_project_deletion": {
      "backlog": 0,
      "latency": 0
    },
    "admin_emails": {
      "backlog": 0,
      "latency": 0
    },
    "analytics_code_review_metrics": {
      "backlog": 0,
      "latency": 0
    },
    "analytics_devops_adoption_create_snapshot": {
      "backlog": 0,
      "latency": 0
    },
    "analytics_usage_trends_counter_job": {
      "backlog": 0,
      "latency": 0
    },
    ... 500+ more
}
  • After
{
  "queues": {
    "database_throttled": {
      "backlog": 0,
      "latency": 0
    },
    "default": {
      "backlog": 0,
      "latency": 0
    },
    "elasticsearch": {
      "backlog": 177,
      "latency": 2070042
    },
    "email_receiver": {
      "backlog": 0,
      "latency": 0
    },
    "gitaly_throttled": {
      "backlog": 0,
      "latency": 0
    },
    "imports": {
      "backlog": 0,
      "latency": 0
    },
    "low_urgency_cpu_bound": {
      "backlog": 108,
      "latency": 2070042
    },
    "mailers": {
      "backlog": 0,
      "latency": 0
    },
    "memory_bound": {
      "backlog": 0,
      "latency": 0
    },
    "quarantine": {
      "backlog": 0,
      "latency": 0
    },
    "service_desk_email_receiver": {
      "backlog": 0,
      "latency": 0
    },
    "urgent_cpu_bound": {
      "backlog": 0,
      "latency": 0
    },
    "urgent_other": {
      "backlog": 3,
      "latency": 1564815
    }
  }
}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Quang-Minh Nguyen

Merge request reports

Loading