Remove duplicated services in data migration
Summary
The gitlab.com database holds some invalid data in the services table. A project is only allowed to have one service of each type but we found that there are duplicates. See &3366 (closed)
Improvements
Write a data migration to remove each duplicated service record.
We currently don't define an ORDER
in the queries so the used record is not deterministic, but we can just assume the record returned by the default order is the correct one, and delete all others for the same project and service type. See the discussion in #282472 (comment 450051986).
Risks
- This is a long-running migration, so we need to efficiently batch the records to avoid running into timeouts.
- We haven't found the root cause for the duplicated records yet, so we might need to re-run this migration later.
Testing plan
We can verify the rollout of the migration by:
-
1️⃣ Querying for the number of projects with duplicate records inservices
:SELECT count(*) FROM (SELECT DISTINCT "services"."project_id" FROM "services" WHERE "services"."project_id" IS NOT NULL GROUP BY "services"."project_id", "services"."type" HAVING (count(*) > 1)) sub;
-
2️⃣ Querying for the number of duplicate records inservices
:WITH service_ids AS (SELECT max(id) AS max_id FROM services where project_id IS NOT NULL GROUP BY type, project_id) SELECT count(*) FROM services WHERE project_id IS NOT NULL AND id not IN (SELECT max_id FROM service_ids);
-
Note: This query is expensive and can take up to a minute on staging, so it needs
SET statement_timeout=0;
-
Note: This query is expensive and can take up to a minute on staging, so it needs
-
3️⃣ Querying for the number of project-associated records inservices
:SELECT count(*) FROM "services" WHERE "services"."project_id" IS NOT NULL;
The expected numbers for the different environments are:
GPRD / gitlab.com
Query | Before | After |
---|---|---|
508 | 0 | |
715 | 0 | |
756070¹ | 755355 (-715) |
GSTG / staging.gitlab.com
Query | Before | After |
---|---|---|
959831 | 0 | |
960056 | 0 | |
7886312¹ | 6926256 (-960056) |
¹ Queried at 2021-01-07 12:22:16+01:00, this number can increase with usage.