Deduplicate epic iids and add uniqueness constraint
What does this MR do?
- Detects and fixes Epic IID duplicates per group within DB.
- Adds a unique index constraint to
epics
table ongroup_id, iid
pair of fields - As index migration needs to be ran after the data has been de-duplicated, both migrations are added to post_migrate
Current number duplicates on gitlab.com: ~14
gitlabhq_production=> SELECT iid, group_id, COUNT(*) FROM epics GROUP BY iid, group_id HAVING COUNT(*) > 1;
iid | group_id | count
------+----------+-------
10 | 7936550 | 2
6 | 5806375 | 2
2 | 10009200 | 2
1 | 9811553 | 2
1 | 9806281 | 2
1 | 4925694 | 2
4827 | 9970 | 2
1 | 10009200 | 3
1 | 7569022 | 3
2 | 6874569 | 2
1 | 9861922 | 2
8 | 5806375 | 2
2 | 6536162 | 2
1 | 6874569 | 2
(14 rows)
Execution plan
HashAggregate (cost=17031.82..17103.31 rows=1906 width=16) (actual time=317.357..329.170 rows=14 loops=1)
Group Key: epics.iid, epics.group_id
Filter: (count(*) > 1)
Rows Removed by Filter: 57181
Buffers: shared hit=184 read=3788 dirtied=8
-> Seq Scan on public.epics (cost=0.00..16459.91 rows=57191 width=8) (actual time=0.151..280.305 rows=57211 loops=1)
Buffers: shared hit=184 read=3788 dirtied=8
Time: 330.285 ms
- planning: 0.272 ms
- execution: 330.013 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 184 (~1.40 MiB) from the buffer pool
- reads: 3788 (~29.60 MiB) from the OS file cache, including disk I/O
- dirtied: 8 (~64.00 KiB)
- writes: 0
SQLs generated when running the migration
(3.1ms) SELECT iid, group_id, COUNT(*) FROM epics GROUP BY iid, group_id HAVING COUNT(*) > 1;
Epic Load (1.2ms) SELECT "epics".* FROM "epics" WHERE "epics"."iid" = $1 AND "epics"."group_id" = $2 ORDER BY "epics"."id" ASC LIMIT $3 [["iid", 2], ["group_id", 22], ["LIMIT", 1]]
Group Load (0.6ms) SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = $1 AND "namespaces"."id" = $2 LIMIT $3 [["type", "Group"], ["id", 22], ["LIMIT", 1]]
(0.1ms) BEGIN
InternalId Load (0.4ms) SELECT "internal_ids".* FROM "internal_ids" WHERE "internal_ids"."namespace_id" = $1 AND "internal_ids"."usage" = $2 LIMIT $3 [["namespace_id", 22], ["usage", 4], ["LIMIT", 1]]
InternalId Load (0.5ms) SELECT "internal_ids".* FROM "internal_ids" WHERE "internal_ids"."id" = $1 LIMIT $2 FOR UPDATE [["id", 12], ["LIMIT", 1]]
InternalId Update (0.5ms) UPDATE "internal_ids" SET "last_value" = $1 WHERE "internal_ids"."id" = $2 [["last_value", 9], ["id", 12]]
(0.9ms) COMMIT
related to #277354 (closed)
Migrations Output
Up
gitlab(issue-277354)$ be rake db:migrate
== 20201106134950 DeduplicateEpicIids: migrating ==============================
== 20201106134950 DeduplicateEpicIids: migrated (0.2515s) =====================
== 20201106151002 AddUniqIndexToEpicIidGroupId: migrating =====================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:epics, [:group_id, :iid], {:unique=>true, :name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
-> 0.0077s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- add_index(:epics, [:group_id, :iid], {:unique=>true, :name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
-> 0.0081s
-- execute("RESET ALL")
-> 0.0002s
== 20201106151002 AddUniqIndexToEpicIidGroupId: migrated (0.0168s) ============
Down
gitlab(issue-277354)$ be rake db:migrate:down VERSION=20201106151002
== 20201106151002 AddUniqIndexToEpicIidGroupId: reverting =====================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:epics, [:group_id, :iid], {:name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
-> 0.0120s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- remove_index(:epics, {:name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently, :column=>[:group_id, :iid]})
-> 0.0163s
-- execute("RESET ALL")
-> 0.0002s
== 20201106151002 AddUniqIndexToEpicIidGroupId: reverted (0.0293s) ============
gitlab(issue-277354)$ be rake db:migrate:down VERSION=20201106134950
== 20201106134950 DeduplicateEpicIids: reverting ==============================
== 20201106134950 DeduplicateEpicIids: reverted (0.0000s) =====================
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Edited by Alexandru Croitor