Don't record duration for errors
What does this MR do?
This makes the request duration histograms consistent across both metrics and only records it when the request was successful.
It also makes sure that the way the error was received in the middleware does not matter anymore. Before this change, an error that bubbled up towards the middleware would count towards a groups error budget, but not towards the service availability. With this, both metrics will not include durations of requests that resulted in responses with a 5xx status code.
If a request failed, it's not very important for users how fast it failed.
For our metrics it could also skew results: A very fast 500 would have less impact on a service's availability or a groups budget spend than a slow one. While they should weigh the same. Similarly, a slow failure should not count double towards availability.
This results in the following scoring for requests:
Fast | Slow | |
---|---|---|
Success | 2/2 | 1/2 |
Error | 0/1 | 0/1 |
Part of gitlab-com/gl-infra/scalability#1098 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.