Migrate empty releases to ghost user
What does this MR do and why?
This MR performs backfill migration on the releases
table for the records which does not have an author_id
.
- We verified that there are no new empty release author_id records after merging !104853 (merged)
- For the older records it was decided to perform a backfill migration.
Related to #386089 (closed)
We want to start with the backfill data migration first as it will enable to us to test the model validation with FF validate_release_with_author
. Plan of steps is listed in implementation proposal.
Note:
- For the records which are backfilled, there is no logical revert. If required we can get the modified record ids from DB snapshot after
2022-11-30 23:59:59
as we have already verified that there are no new releases records with empty author_id.
Estimation on gitlab.com
- Row Count: 958400 (19.40% of entire rows)
- Batch size: 1000 (sub-batch size: 100)
- Job count: 959 jobs
- Job interval: 2 min
- Total Duration: 1918 (~ 1 day 8 hours)
Database
See !106813 (comment 1210767361) for DB testing.
Up
> bundle exec rake db:migrate:up:main VERSION=20221215151822
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: migrating =================
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: migrated (0.3046s) ========
Down
> bundle exec rake db:migrate:down:main VERSION=20221215151822
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: reverting =================
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: reverted (0.2285s) ========
Table stats
-
SELECT COUNT(*) FROM "releases";
=>Release.count
-
SELECT COUNT(*) FROM "releases" WHERE "releases"."author_id" IS NULL;
=>Release.where(:author_id => nil).count
See: #367522 (comment 1213090878)
Queries
Queries are from the local log.
SELECT releases.id FROM releases WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL ORDER BY releases.id ASC LIMIT 1
SELECT releases.id FROM releases WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL AND releases.id >= 81 ORDER BY releases.id ASC LIMIT 1 OFFSET 10
UPDATE releases SET author_id = 32 WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL AND releases.id >= 81
Explain from a sampled batch
All executed in cold cache
-
SELECT releases.id FROM releases WHERE releases.id BETWEEN 4000000 AND 4001000 AND releases.author_id IS NULL ORDER BY releases.id ASC LIMIT 1
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14041/commands/48984
-
SELECT releases.id FROM releases WHERE releases.id BETWEEN 4000000 AND 4001000 AND releases.author_id IS NULL AND releases.id >= 4000069 ORDER BY releases.id ASC LIMIT 1 OFFSET 100
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14041/commands/48988
-
UPDATE releases SET author_id = 1243277 WHERE releases.id BETWEEN 4000000 AND 4001000 AND releases.author_id IS NULL AND releases.id >= 4000069
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14041/commands/48990
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.