Adjust image-scaler latency buckets

Part of gitlab-com/gl-infra/production#5474 (closed)

We are looking to relax the latency SLO for the image scaler so that a small number of latency violations won't trip apdex thresholds as easily, as these errors are often not actionable.

Apdex is based on response latency, in this case the gitlab_workhorse_image_resize_duration_seconds_bucket histogram. We trip this when there are too many observations above the 800ms bucket currently (correct me if I'm misreading this).

The problem is this: the vast majority of image scaling requests never actually invoke the scaler, since they are conditional GETs, i.e. success-client-cache; those are no-ops from the perspective of the scaler component, it is just Workhorse confirming to the client that their cached image data is still current.

This presents the problem that the rate at which we take latency observations, which happens only after the scaler actually scaled something is very low. For example, if there are 100 image scaling rps, but only 10 actually invoked the scaler while 90 images were served from client caches, then if only 2 scaler actions are slow, that means 20% of the "requests" were slow, but that's not really true because in reality only 0.2 * 0.1 = 2% of all requests were slow (due to 90% coming out of caches).

We should therefore add a new latency bucket above 800ms and incorporate it into our apdex thresholds.

Update:

An alternate approach of measuring the cached requests too was discussed at #340162 (comment 678112900), so we are going to be doing that.

Edited Sep 16, 2021 by Manoj M J