Add Pending Alert Escalations table, model, services and worker
What does this MR do?
Note: This is behind feature flag escalation_policies_mvc
, and licensed flag escalation_policies
.
DB Migration
This adds the AlertEscalation
(incident_management_alert_escalations
) table, as part of #323139 (closed).
incident_management_pending_alert_escalations |
type | Null |
---|---|---|
id | bigint | not null |
rule_id | bigint | null |
alert_id | bigint | not null |
schedule_id | bigint | not null |
status | smallint | not null |
process_at | time with zone | not null |
created_at | time with zone | not null |
updated_at | time with zone | not null |
Database commands:
Up
== 20210617022324 CreateIncidentManagementPendingAlertEscalations: migrating ==
CREATE TABLE incident_management_pending_alert_escalations (
id bigserial NOT NULL,
rule_id bigint,
alert_id bigint NOT NULL,
schedule_id bigint NOT NULL,
process_at timestamp with time zone NOT NULL,
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL,
status smallint NOT NULL,
PRIMARY KEY (id, process_at)
) PARTITION BY RANGE (process_at);
CREATE INDEX index_incident_management_pending_alert_escalations_on_alert_id
ON incident_management_pending_alert_escalations USING btree (alert_id);
CREATE INDEX index_incident_management_pending_alert_escalations_on_rule_id
ON incident_management_pending_alert_escalations USING btree (rule_id);
CREATE INDEX index_incident_management_pending_alert_escalations_on_schedule_id
ON incident_management_pending_alert_escalations USING btree (schedule_id);
CREATE INDEX index_incident_management_pending_alert_escalations_on_process_at
ON incident_management_pending_alert_escalations USING btree (process_at);
ALTER TABLE incident_management_pending_alert_escalations ADD CONSTRAINT fk_rails_fcbfd9338b
FOREIGN KEY (schedule_id) REFERENCES incident_management_oncall_schedules(id) ON DELETE CASCADE;
ALTER TABLE incident_management_pending_alert_escalations ADD CONSTRAINT fk_rails_057c1e3d87
FOREIGN KEY (rule_id) REFERENCES incident_management_escalation_rules(id) ON DELETE SET NULL;
ALTER TABLE incident_management_pending_alert_escalations ADD CONSTRAINT fk_rails_8d8de95da9
FOREIGN KEY (alert_id) REFERENCES alert_management_alerts(id) ON DELETE CASCADE;
Down
== 20210617022324 CreateIncidentManagementPendingAlertEscalations: reverting ==
-- drop_table(:incident_management_pending_alert_escalations)
-> 0.0145s
== 20210617022324 CreateIncidentManagementPendingAlertEscalations: reverted (0.0216s)
Creation of Pending Alert Escalations
We create an escalation on all incoming alerts where the project has an Escalation policy (and rules) set up. This is of course guarded by the feature flag.
The logic for creating the escalations is held in IncidentManagement::PendingEscalations::CreateService
, which takes a target (an AlertManagement::Alert
, and in the future, an Incident issue).
Deleting / Creating Escalations on status changes
We create or delete escalations as a result of an Alert status change:
Alert Status change | Result |
---|---|
triggered/acknowledged -> resolved/ignored
|
Delete existing Alert Escalations for alert |
resolved/ignored -> triggered/acknowledged
|
Create a new Alert Escalation for the alert |
resolved/ignored -> resolved/ignored
|
No change |
triggered/acknowledged -> triggered/acknowledged
|
No change |
IncidentManagement::PendingEscalations::ProcessService
This evaluates the rule information that is stored on each PendingEscalation
. If the criteria is met (the required status is not set on the alert, and enough time as passed so that process_at
is now in the past), then we notify the oncall schedule.
Workers
To run the service mentioned above, we have a Cron worker and a job worker.
The cron worker, IncidentManagement::Escalations::ScheduleEscalationCheckCronWorker
, iterates over the pending escalations which are ready to process, and spawns a IncidentManagement::Escalations::PendingAlertEscalationCheckWorker
job for each.
It does this in batches of 1000
using bulk_perform_async
.
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #323139 (closed)