Add service to bulk process alerts
What does this MR do and why?
Context:
When a project has an alert integration configured, prometheus servers can be configured to send alerts to GitLab. When we receive a prometheus request, it may contain many unique alerts within the same payload. Currently, we handle each alert individually. However, there's no strict limit on the Prometheus side to the number of alerts a notification can contain. So we've got a pretty hefty N+1.
Related issue: https://gitlab.com/gitlab-org/gitlab/-/issues/348676
Changes in this MR:
- Adds a service to process alerts in bulk (which is not yet utilized)
- The functionality of new
BulkProcessAlertsService
should match the effects ofAlertManagement::AlertProcessing
- The functionality of new
- Adds new/replacement specs for the alert processing flow
- The new specs for
BulkProcessAlertsService
will eventually replace the examples inspec/support/shared_examples/services/alert_management/alert_processing/
- The new specs for
Scope:
-
BulkProcessAlertsService
will be utilized in !95834 (closed) & !95854 (closed) - Removal/cleanup of the old classes/modules/specs will happen in a future MR
Motivation:
- By adding a new service instead of reworking the existing
AlertProcessing
module, we can fully separate the effects of alert processing from auth and from payload interpretation. Isolation of these responsibilities makes it easier to test the code and makes it clearer where tests should be - The new specs group expectations into significantly fewer examples. This is to
- create fewer records
- run the processing logic fewer times
- make the expected behavior more explicit
Holistic overview
Related MRs
All MRs will merge into master, but should merge in the order below
flowchart RL
master
95827["1. Add service (!95827)"]
95834["2.a. Use service for generic alerts (!95834)"]
95854["2.b. Use service for prometheus alerts (!95854)"]
TBD["3. Delete unneeded tests/files (MR TBD)"]
95827-->master
95834-->95827
95854-->95827
TBD-->95834
TBD-->95854
Expected data flow for alert processing
- Alert comes into integration endpoint
- Controller determines which
NotifyService
to use -
NotifyService
checks permissions & validity -
NotifyService
parses input intoGitlab::AlertManagement::
Payload` format -
NotifyService
passes payloads toBulkProcessAlertsService
-
BulkProcessAlertsService
is responsible for finding any existing alerts for the payloads and triggering any side-effects - For each alert,
BulkProcessAlertsService
delegates toUpdateAlertFromPayloadService
to modify the alert itself.- As a note: This is still an N+1, but it will stay for now, since we return the id of each alert in the HTTP response. The expected system notes & record updates will vary by alert, so we still need to run validations on save.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Sarah Yasonik