Fix query to get release status metric values
What does this MR do and why?
The prometheus query queries a bit too further back than I expected, so it's getting the max value of the metric, like
- Monthly release: We had another bug earlier this month when we accidentally created a
rc_tagged
status release metric for the current release, when we actually tagged an earlier release RC. Although the metric was corrected, history says that the max value was set to 3 (rc_tagged
) - Patch release: We already had a patch release earlier this month for the same versions, so since the time range of the query looks back that far, it fetched the maximum value, which is 3.
Which resulted in us setting the metrics with value of 3
instead of refreshing the value with the expected/latest value of 1
(example pipeline job output).
Instead, we should just get the latest value by querying with last_over_time
, like:
last_over_time(delivery_release_monthly_status{version=<version>}[1h])
The scheduled job to refresh the metrics runs every 10 minutes, so 1h
should be sufficient to get the last emitted metric, even if the prometheus/grafana pods restart and flush the metrics.
release_status_metric_update
(to keep it separate from release_status_metric
FF that creates metrics)
Addresses: gitlab-com/gl-infra/delivery#20181 (closed)
Edited by Jenny Kim