Redis latency monitoring: focus on slower requests
What does this MR do?
This MR is refining the Prometheus histogram buckets for gitlab_redis_client_requests_duration_seconds
. In the first iteration of this metric, we made an effort not to add too many buckets, because this adds to much data for Prometheus to keep track of. In gitlab-com/runbooks!2542 (merged) we realized that in order to have useful Redis latency monitoring, we need to focus the histogram buckets on slower requests.
In this MR we remove the 0.001s bucket. While a lot of Redis calls do take less than that, knowing this is not useful for monitoring. At the tail end, we add 0.1s and 0.5s buckets.
Removing histogram buckets can cause problems in our monitoring framework, because for example our apdex queries use specific le="123"
selectors. But in this case we are modifying a metric we're not relying on yet so the removal of 0.001
should be fine.
This is part of gitlab-com/gl-infra/scalability#439 (closed)
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team