Backfill escalation policies for on-call schedules [RUN ALL RSPEC] [RUN AS-IF-FOSS] (!62233) · Merge requests · GitLab.org / GitLab

Sarah Yasonik requested to merge sy-backfill-escalation-policies into master May 20, 2021

What does this MR do?

This MR backfills a single EscalationPolicy for each project which has OncallSchedules.

Context

This is a part of the Escalation Policies MVC [technical plan], which extends on-call schedules to allow alerts to escalate between schedules (ex primary on-call ignores the alert, notify the secondary on-call). Escalation policies will have many escalation rules, and an escalation rule describes the conditions in which we should notify a user of an alert. So a rule would dictate something like "if the alert has not been acknowledged after 5 minutes, notify the Primary On-call Schedule."

The purpose of this MR is to ensure that users with existing on-call schedules will be notified of alerts in the same away after escalation policies are rolled out, without users needing to manually configure anything.

Existing on-call schedules feature demo: https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/oncall_schedules

Existing alert notification logic:

Users can create one on-call schedule per project through the UI, but could create multiple schedules via API
When an alert is received for a project, we notify the on-call user in every schedule
If the alert is already acknowledged, we do not send an additional notification

Two main changes:

Adds a post-deploy migration to create escalation policies
Adds an after_create callback to backfill policies for new on-call schedules as they're created

Feature flags

escalation_policies_mvc -> existing flag which controls the escalation policies feature as a whole. Once it is enabled, users will be required to manually configure their escalation policies in order to utilize gitlab's on-call schedule management. Users who already have on-call schedules configured should experience no disruption to their alert notifications.
escalation_policies_backfill -> added in this MR & enabled by default; controls just the backfill logic. This is a fail-safe so we have the ability to turn off everything escalation-policies related. This flag will be removed after the escalation_policies_mvc flag is enabled and removed.

Migration tidbits:

gitlab.com has ~100 on-call schedules, so we expect the data migration to be pretty small & quick, as usage on self-managed instances is pretty negligible thus far.

Query Plan: https://explain.depesz.com/s/41uS

Up output:

$ bin/rails db:migrate
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: migrating =====
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: migrated (0.0101s)

Down output:

$ bin/rails db:migrate:down VERSION=20210519220019
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: reverting =====
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: reverted (0.0000s)

Does this MR meet the acceptance criteria?

Conformity

I have included a changelog entry, or it's not needed. (Does this MR need a changelog?)
I have added/updated documentation, or it's not needed. (Is documentation required?)
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
[🤞] I have followed the style guides.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
I have tested this MR in all supported browsers, or it's not needed.
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited May 27, 2021 by Sarah Yasonik

Backfill escalation policies for on-call schedules [RUN ALL RSPEC] [RUN AS-IF-FOSS]