Destroy invalid group membership records
What does this MR do and why?
We have some invalid membership records that may have been the result of an old console/SQL delete. The namespace IDs don't point to existing records. This MR removes them.
If needed we can do the same for Project members in another MR.
See !79204 (comment 836039745) for context.
DB info
All records with group type and invalid source_id
(443 as of 5 July)...
explain SELECT COUNT(*) FROM members LEFT JOIN namespaces ON members.source_id = namespaces.id WHERE members.type = 'GroupMember' AND namespaces.id IS NULL;
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10911/commands/39198
As of 4 February there are 443 invalid group membership records - see !78715 (comment 831433787).
SELECT COUNT(*) FROM members
LEFT JOIN namespaces ON members.source_id = namespaces.id
WHERE members.type = 'GroupMember'
AND namespaces.id IS NULL;
count
-------
443
(1 row)
Background Migration Details:
443 items to delete
batch size = 10000
443 / x = ? batches
Estimated times per batch:
- xms for select statement with 1000 items (see linked explain plan)
- yms for delete statement with 1000 items (see linked explain plan)
Total: ~x+y sec per batch
2 mins delay per batch (safe for the given total time per batch)
y batches * 2 min per batch = z mins to run all the scheduled jobs
Queries
main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrating ==========
main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrated (0.0610s) =
.main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrating ==========
main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrated (0.0381s) =
Explains (from my local)
# batching strategy
SELECT "members"."id" FROM "members" WHERE ("id" >= 13) AND "members"."member_namespace_id" IS NULL AND "members"."source_type" = 'Group' ORDER BY "members"."id" ASC LIMIT 1
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40749.
destroy_invalid_group_members_spec.rb (console output from rspec test)
↳ lib/gitlab/database.rb:364:in `commit'
Load (0.5ms) SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL ORDER BY "members"."id" ASC LIMIT 1
↳ app/models/concerns/each_batch.rb:62:in `each_batch'
Load (0.4ms) SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL AND "members"."id" >= 1 ORDER BY "members"."id" ASC LIMIT 1 OFFSET 100
↳ app/models/concerns/each_batch.rb:81:in `block in each_batch'
Load (0.4ms) SELECT "members".* FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL AND "members"."id" >= 1
↳ lib/gitlab/background_migration/destroy_invalid_group_members.rb:14:in `map'
SELECT "members".* FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 10000 AND 20000 AND "namespaces"."id" IS NULL AND "members"."id" >= 10000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40750
DELETE FROM "members" WHERE "members"."id" IN (SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 10000 AND 20000 AND "namespaces"."id" IS NULL AND "members"."id" >= 10000)
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40751
Data removal set verification
- Logging of affected IDs
- Production clone dry-run
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #365028 (closed) and #349575 (closed)