Stop syncing alert and incident statuses
What does this MR do and why?
Related issues: #356057 (closed), https://gitlab.com/gitlab-org/gitlab/-/issues/348676
Changes:
- Allows the status attributes between a related incident & alert to be independently updated (removing the sync behavior)
- Clears escalation policy attribute from any incidents which were created from alerts
Context & motivation:
-
Updating behavior to pave the way for new features
- We're adding two new capabilities for alerts & incidents:
- ability to link an alert to an incident after the incident has been created (currently only linkable via creating the incident from the alert)
- ability to link one incident to multiple alerts (currently only allowed 1:1 incident:alert)
- With the new functionality, it doesn't make sense for the incident status & alert status to automatically match, since different alerts might be resolved at different times for the same incident. And an incident may have been escalated prior to an alert being associated, so we wouldn't want to alter the escalation behavior for that incident.
- We're adding two new capabilities for alerts & incidents:
-
Improving endpoint performance
- The more actions we take for an incoming alert (like the status sync), the longer the request takes. We've been encountering timeout errors and scale issues.
- Removing the sync behavior reduces the requirements of the alerting endpoints & helps us to improve the request performance.
Scope note: Future MRs will allow an escalation policy to be applied for any incident, and to link incidents to multiple alerts. This MR is constrained to allowing an independent incident status.
database info:
-
terminal output
DOWN:
% bin/rails db:migrate:down:main VERSION=20220629184402 main: == 20220629184402 UnsetEscalationPoliciesForAlertIncidents: reverting ========= main: == 20220629184402 UnsetEscalationPoliciesForAlertIncidents: reverted (0.0024s)
UP:
% bin/rails db:migrate main: == 20220629184402 UnsetEscalationPoliciesForAlertIncidents: migrating ========= main: == 20220629184402 UnsetEscalationPoliciesForAlertIncidents: migrated (0.0421s)
sql queries
# Batching SELECT "incident_management_issuable_escalation_statuses"."id" FROM "incident_management_issuable_escalation_statuses" ORDER BY "incident_management_issuable_escalation_statuses"."id" ASC LIMIT 1 SELECT "incident_management_issuable_escalation_statuses"."id" FROM "incident_management_issuable_escalation_statuses" WHERE "incident_management_issuable_escalation_statuses"."id" >= 1 ORDER BY "incident_management_issuable_escalation_statuses"."id" ASC LIMIT 1 OFFSET 1000 # Nullify values for records UPDATE "incident_management_issuable_escalation_statuses" SET "policy_id" = NULL, "escalations_started_at" = NULL WHERE "incident_management_issuable_escalation_statuses"."id" IN ( SELECT "incident_management_issuable_escalation_statuses"."id" FROM "incident_management_issuable_escalation_statuses" INNER JOIN alert_management_alerts ON alert_management_alerts.issue_id = incident_management_issuable_escalation_statuses.issue_id WHERE "incident_management_issuable_escalation_statuses"."id" >= 1 AND "incident_management_issuable_escalation_statuses"."policy_id" IS NOT NULL )
Screenshots or screen recordings
Expected behavior | Original behavior, if different | |
---|---|---|
Changing the status of an alert [WITH associated incident] |
- Alert status changes. - Alert gets a system note. |
- Alert status changes. - Alert gets a system note. - Incident status changes. - Incident gets a system note which references the alert. |
Changing the status of an alert [WITHOUT associated incident] |
- Alert status changes. - Alert gets a system note. |
|
Changing the status of an incident [WITH associated alert] |
- Incident status changes. - Incident gets a system note. |
- Incident status changes. - Incident gets a system note. - Alert status changes. - Alert gets a system note which references the incident. |
Changing the status of an incident [WITHOUT associated alert] |
- Incident status changes. - Incident gets a system note. |
|
Opening an incident from an alert | - Incident status is set to Triggered . |
- Incident status is set to match the alert. - Incident escalation policy is set to match the alert. |
Receiving a recovery alert [WITH associated incident] |
- If setting enabled, incident is resolved. | |
Closing an incident [WITH associated alert] |
- Alert is resolved. | |
Setting an escalation policy for an incident [WITHOUT associated alert] |
- Sets the status to Triggered & starts escalations. |
|
Setting an escalation policy for an incident [WITH associated alert] |
- Policy is not modifiable. - Policy value matches the associated alert. |
- Policy is not modifiable. - Policy is blank. |
How to set up and validate locally
- Pre-req: project with maintainer+ user
- Creating an escalation policy:
- Nav to
Monitor > Escalations Policies
- Select
Add an escalation policy
button to create a policy - Add a rule to notify a single user & save (user rule isn't necessary, just fastest)
- Nav to
- Creating an incident:
- Create a normal issue with a type
incident
. - Status & escalation policy field are in the sidebar.
- Create a normal issue with a type
- Creating an incident with an associated alert:
- Set up an active alert integration. Skip the custom mapping (skipping the mapping isn't necessary, just fastest).
- Select the
Send test alert
tab to send a payload like{ "title": "Sample alert to test incident/alert statuses" }
- Nav to
Monitor > Alerts
to find the new alert - Select the
Create incident
button
- Sending a recovery alert:
- Select the
Send test alert
tab for the alert integration, and send a resolving payload like{ "title": "Sample alert to test incident/alert statuses", "end_time": "2022-06-30T03:01:53.772Z" }
- Select the
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Sarah Yasonik