Skip to content

Alert on high number of overdue tasks for the container registry online GC

João Pereira requested to merge online-gc-queue-size-alert into master

The registry online GC operates on top of two database tables that serve as queues for tasks. There is one for blobs and another for manifest tasks. Due to the registry nature, it's expected that there will be way more tasks for blobs than manifests (an image has a single manifest but N blobs). For more details see https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs-gitlab/db/online-garbage-collection.md.

One of the metrics we have is the number of these tasks that are overdue. We should have an alert when the number of such tasks goes too high. Right now it's difficult to classify what is "too high", we'll have to figure out during the gradual production rollout. Therefore, this MR sets a reasonable limit for both queues. These should be high enough to not trigger an alarm during the initial percentage-based rollout. We'll need to adapt the value as we move on.

Related to https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14254.

Edited by João Pereira

Merge request reports

Loading