Add last update at gauge
What does this MR do?
We report gauge metrics for batched migrations from the sidekiq jobs. These jobs execute on different hosts over time and each of those hosts keeps reporting latest gauge value it saw for a while.
This leads to confusing situations where there are different values being reported for the same gauge (from different hosts):
This change adds a unix timestamp gauge with the same set of labels as the other gauges. We intend to use this to reason about whatever the "latest" value is we should be looking at. This looks roughly like so (and might become a recording rule):
batched_migration_job_batch_size{}
* on(job, instance, migration_identifier) group_left()
group by (job, instance, migration_identifier) (topk by (migration_identifier) (1, batched_migration_job_last_update_time_seconds{env="gprd", migration_identifier="CopyColumnUsingBackgroundMigrationJob/ci_builds.id"}))
This is per suggestion from @andrewn how to deal with this situation. An alternative we discussed was moving those metrics over to gitlab-exporter
which can easily report the same gauges with a simple database query. We might end up doing that, but I would like to explore the option at hand first.
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because (feature flag).
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content