Merge or add relation between Alert models
Background
In %13.0 we added Alert Management to GitLab.
This leveraged the existing alerting APIs (Generic alert endpoint, and the to save alerts in a consistent format - the AlertManagement::Alert
model.
However, we are still using other models to save Prometheus alerts/
- Prometheus metrics are saved in
PrometheusMetric
. - PrometheusAlert saves an alert, based on a PromtheusMetric, which sent to Prometheus to wait to be triggered. This contains a relation to a metric,
Environment
andProject
. - PrometheusAlertEvent - this is an instance of an alert that has occurred. It contains payload information, a started at/ended at and a status
The Problem
We currently have a fragmented approach. We are using the new AlertManagement Alerts as the new and improved alert model, however they have some glaring holes:
- We have no relation to PrometheusAlerts. Because of this we can't easily tell if one alert relates to another.
- We can't relate alerts back to an Environment. Without this, our interaction between stages is compromised (See this issues from the ~"devops::release" stage #214634 (comment 358445777)), which in turn reduces our ability to dog food.
- A fragmented data structrue is hard to maintain. We already have a [technical debt issue])(#217407 (closed)), which briefly covers some issues brought up with having multiple data flows for alerts etc.
Solutions
PrometheusAlertEvents
with AlertManagement::Alert
1. Replace The PrometheusAlertEvents
data structure is very similar to that of the new Alert model. Wr could combine the two, and add relations from the new alerts to the various models (PrometheusAlert etc) which would give us access to the relevant Environment & PrometheusAlert + PrometheusMetric from the AlertManagement::Alert
model.
If we need to, we could create a data migration to move the data in PrometheusAlertEvents
to AlertManagement::Alert
.
This solution in my opinion would be the best long term solution.
#### 2. Add a relation between
PrometheusAlertEvents
with AlertManagement::Alert
We could add a relation between the two, so that any Payload that causes both models to be created could be related.
This would allow the same as option 1, but would require more data to be saved. However it might be easier to implement in the short term.