Monitoring and alerting for Elasticsearch sorted sets buffer queue
Follow up from #34086 (closed)
Since we've moved away from Sidekiq as being our main buffer we lose a bunch of monitoring and alerting features that exist for all of Sidekiq at GitLab. We will need visibility into this queue and ideally alerting when something is not right.
Suggested by @andrewn gitlab-com/gl-infra/scalability#164 (comment 290051290)
I would suggest we handle this with the metrics catalog (as part of the
web
service possibly, orapi
) and setup declarations for Apdex (queue speed, index speed), Error Rate and Operations Rate. Happy to walk you though this closer to the time.
Links
- https://docs.gitlab.com/ee/administration/monitoring/prometheus/index.html
- https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html
- https://gitlab.com/gitlab-org/gitlab/blob/master/config/initializers/7_prometheus_metrics.rb
- https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards#local-development
- https://gitlab.com/gitlab-org/gitlab-exporter
- https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/services/geo/metrics_update_service.rb#L47
Edited by Dylan Griffith