Delete invalid EpicIssue records
What does this MR do and why?
Related to #339514
This data migration will delete records from the table epic_issues
that are considered invalid.
Our criteria to consider them invalid is when the issue belongs to a group that is not the same as the epic group, or its hierarchy.
This was attempted before in !73007 (merged) but it had to be reverted for being the root cause of an incident gitlab-com/gl-infra/production#6072 (closed), therefore, extra care is needed to make sure the situation is not repeated.
The main difference in the code of this MR vs !73007 (merged) is that the background migration DeleteInvalidEpicIssues
now includes descendant groups as valid.
In the diagram attached, this means that the valid link number 3 was previously considered invalid, and thus deleted.
Additional specs should be covering this case now.
Number of records that will be deleted
We don't expect this number to be too high because these records originate from cases where an issue was moved to a different group hierarchy and maintained the link to the epic. A fix to prevent this was introduced %14.4.
During the incident, we restored ~210k records from the difference in snapshots, plus 79 restored from notes.
Total number of epic_issue records: ~1M
To be deleted: 4100 (see !79557 (comment 839773871))
To be deleted (updated): ~5100 (see !79557 (comment 964225171))
Database
Given that this migration deletes data, it needs to comply with the following steps (see guidelines):
-
If this migration is not reversible, add how could the deleted records be recovered. In case of a problem that requires the data to be restored, we can select the
epic_issues
with the logged ids from the latest snapshot. To restore it, if the records are exported as CSV, we can use a ruby script similar to gitlab-com/gl-infra/production#6072 (comment 776521987) -
Merge Request includes the data-deletion label -
Concise descriptions of possible user experience impact of an error - Issues would unexpectedly go missing from the list of issues associated with an Epic. No notes stating that issues have been unassigned from the epic.
- From the issue view, the Epic previously assigned will not be present in the right sidebar and will display
None
instead. No note stating this change.
-
Relevant data from the query plans that indicate the query works as expected
Migration output
UP
db:migrate:up VERSION=20220128103042
== 20220128103042 ScheduleDeleteInvalidEpicIssuesRevised: migrating ===========
== 20220128103042 ScheduleDeleteInvalidEpicIssuesRevised: migrated (0.0704s) ==
DOWN
db:migrate:down VERSION=20220128103042
== 20220128103042 ScheduleDeleteInvalidEpicIssuesRevised: reverting ===========
== 20220128103042 ScheduleDeleteInvalidEpicIssuesRevised: reverted (0.0181s) ==
Execution times
There are ~200_000 epics so with a batch size of 1_000
we would schedule 200 jobs.
The queries needed for the migration would be:
- Getting batch of epics
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/9823/commands/34749
SELECT "epics"."id" FROM "epics" WHERE "epics"."id" BETWEEN 1 AND 1001 ORDER BY "epics"."id" ASC LIMIT 1000
- Getting sub-batch of epics
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/9983/commands/35318
SELECT "epics"."id" FROM "epics" WHERE "epics"."id" BETWEEN 1 AND 1001 AND "epics"."id" >= 1 ORDER BY "epics"."id" ASC LIMIT 1 OFFSET 50
- Getting epic associations
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/9983/commands/35316
SELECT epics.group_id as group_id, epics.id as id, epic_issues.id as epic_issue_id, projects.namespace_id as issue_namespace_id
FROM "epics"
INNER JOIN "epic_issues" ON "epic_issues"."epic_id" = "epics"."id"
INNER JOIN "issues" ON "issues"."id" = "epic_issues"."issue_id"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE "epics"."id" BETWEEN 1 AND 1001 AND "epics"."id" >= 1 AND "epics"."id" < 50
- Getting group & descendants for every epic
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/9983/commands/35289
SELECT namespaces.* FROM namespaces WHERE namespaces.type = 'Group' AND (traversal_ids @> ('{9970}'))
- Deleting epic_issue records
DELETE FROM epic_issues WHERE epic_issues.id IN ()
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.