Unable to measure latency on sidekiq jobs that take more than 2.5 seconds to run
Blocker on gitlab-com/runbooks!1347 (closed)
Related:
- https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7618
- gitlab-cookbooks/gitlab-mtail!49 (merged)
- https://gitlab.com/gitlab-com/gl-infra/infrastructure/merge_requests/132
In https://gitlab.com/gitlab-org/gitlab-ce/commit/cfea48dffd04918e4d457ed92ff987b8246ef4ec we moved Sidekiq monitoring metrics into the application, which is very much more robust than the current approach of using mtail.
One problem is that the top latency histogram bucket is 2.5 seconds. Many jobs take longer than this to run and we have no visibility into them as they will only be lumped in the +Inf bucket.
Edited by Andrew Newdigate