Accept monitoring event as a new object kind, update specs (!2283) · Merge requests · GitLab.org / Quality Department / triage-ops

Jennifer Li requested to merge jennli-react-to-uptime-event into master Jun 12, 2023

What does this MR do and why?

This is to address one of the corrective actions from #1352 (closed), specifically

Add holistic health checks/monitoring in GCP to ensure that we know 100% when triage-ops is working or not log user agent or have some way of uniquely identifying the uptime checks requests from regular requests

This MR accepts monitoring as a new object kind, in addition to issue, incident, MR, and pipeline.

Adds a processor to respond to the uptime check events as a monitoring object, and respond with a logging to indicate that triage-ops is fully funcitonal.

This must be merged before https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure/-/merge_requests/391

related: #1352 (closed).

Expected impact & dry-runs

These are strongly recommended to assist reviewers and reduce the time to merge your change.

See https://gitlab.com/gitlab-org/quality/triage-ops/-/tree/master/doc/scheduled#testing-policies-with-a-dry-run on how to perform dry-runs for new policies.

See https://gitlab.com/gitlab-org/quality/triage-ops/-/blob/master/doc/reactive/best_practices.md#use-the-sandbox-to-test-new-processors on how to make sure a new processor can be tested.

Action items

If adding environment variables for reactive processors, update config/triage-web.yaml and .gitlab/ci/triage-web.yml
(If applicable) Add documentation to the handbook pages for Triage Operations =>
(If applicable) Identify the affected groups and how to communicate to them:
- /cc @person_or_group =>
- Relevant Slack channels =>
- Engineering week-in-review

Accept monitoring event as a new object kind, update specs

What does this MR do and why?

Expected impact & dry-runs

Action items

Merge request reports