Resolve labels with duplicate title and project IDs
requested to merge 30390-race-condition-that-makes-it-possible-to-create-duplicated-labels into master
What does this MR do?
Step 2 of #30390 (comment 254445189)
This MR has multiple migrations for resolving duplicate labels that were introduced.
Strategy has to consider self-hosted customers, as well as anyone on gitlab.com.
Migration outputs
$ rake db:migrate:up VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: migrating =============================
-- execute("CREATE TABLE backup_labels (LIKE labels INCLUDING ALL);")
-> 0.0222s
-- execute("ALTER TABLE backup_labels ALTER COLUMN ID DROP DEFAULT;")
-> 0.0004s
-- execute("ALTER TABLE backup_labels ADD COLUMN restore_action INTEGER;")
-> 0.0006s
-- execute("ALTER TABLE backup_labels ADD COLUMN new_title VARCHAR;")
-> 0.0003s
== 20200305020458 AddLabelRestoreTable: migrated (0.0238s) ====================
$ rake db:migrate:down VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: reverting =============================
-- drop_table(:backup_labels)
-> 0.0171s
== 20200305020458 AddLabelRestoreTable: reverted (0.0171s) ====================
$ rake db:migrate:up VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: migrating =======================
-- transaction_open?()
-> 0.0000s
-- foreign_keys(:backup_labels)
-> 0.0015s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_7de4989a69\nFOREIGN KEY (project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
-> 0.0053s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_7de4989a69;")
-> 0.0063s
-- transaction_open?()
-> 0.0000s
-- foreign_keys(:backup_labels)
-> 0.0021s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_rails_c1ac5161d8\nFOREIGN KEY (group_id)\nREFERENCES namespaces (id)\nON DELETE CASCADE\nNOT VALID;\n")
-> 0.0021s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_rails_c1ac5161d8;")
-> 0.0091s
== 20200305020459 AddLabelRestoreForeignKeys: migrated (0.0413s) ==============
$ rake db:migrate:down VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: reverting =======================
-- remove_foreign_key(:backup_labels, {:name=>"fk_7de4989a69"})
-> 0.0041s
-- remove_foreign_key(:backup_labels, {:name=>"fk_rails_c1ac5161d8"})
-> 0.0039s
== 20200305020459 AddLabelRestoreForeignKeys: reverted (0.0221s) ==============
$ rake db:migrate:up VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: migrating =================
== 20200305082754 RemoveDuplicateLabelsFromProject: migrated (0.0951s) ========
$ rake db:migrate:down VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: reverting =================
== 20200305082754 RemoveDuplicateLabelsFromProject: reverted (0.0056s) ========
$ be rails db:migrate:up VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrating =========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0057s
-- add_index(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0090s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
-> 0.0043s
-- remove_index(:labels, {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently, :column=>[:project_id, :title]})
-> 0.0099s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrated (0.0301s)
$ be rails db:migrate:down VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverting =========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>false, :name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
-> 0.0048s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0026s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverted (0.0077s)
As of 7/7/2020 using queries in !21384 (comment 354329489):
Full duplicates: 305 full duplicates
CTE Scan on data (cost=6876482.76..7112371.68 rows=3494651 width=1169) (actual time=593254.903..618840.515 rows=305 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 9223143
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
CTE data
-> WindowAgg (cost=6535754.32..6876482.76 rows=10483952 width=93) (actual time=592912.920..609623.229 rows=9223448 loops=1)
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Sort (cost=6535754.32..6561964.20 rows=10483952 width=85) (actual time=592912.848..597381.059 rows=9223448 loops=1)
Sort Key: labels.project_id, labels.title, labels.template, labels.description, labels.type, labels.color, labels.id
Sort Method: external merge Disk: 730536kB
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Merge Anti Join (cost=2.57..4345719.49 rows=10483952 width=85) (actual time=171.499..580262.662 rows=9223448 loops=1)
Merge Cond: (labels.id = label_links.label_id)
Buffers: shared hit=30963362 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Merge Anti Join (cost=2.01..2455553.43 rows=10660104 width=85) (actual time=0.420..344855.708 rows=9949673 loops=1)
Merge Cond: (labels.id = board_labels.label_id)
Buffers: shared hit=21898713 read=419284 dirtied=19
I/O Timings: read=284139.828
-> Merge Anti Join (cost=1.72..2428352.58 rows=10670229 width=85) (actual time=0.395..341396.929 rows=9950982 loops=1)
Merge Cond: (labels.id = resource_label_events.label_id)
Buffers: shared hit=21896176 read=419224 dirtied=19
I/O Timings: read=284124.006
-> Merge Anti Join (cost=1.16..864817.36 rows=10862671 width=85) (actual time=0.350..77571.231 rows=10697595 loops=1)
Merge Cond: (labels.id = label_priorities.label_id)
Buffers: shared hit=4190824 read=134114 dirtied=1
I/O Timings: read=54769.635
-> Merge Anti Join (cost=0.87..833679.69 rows=10957388 width=85) (actual time=0.317..73649.101 rows=10763349 loops=1)
Merge Cond: (labels.id = lists.label_id)
Buffers: shared hit=4184162 read=133686 dirtied=1
I/O Timings: read=54673.139
-> Index Scan using labels_pkey on public.labels (cost=0.43..763477.04 rows=11479303 width=85) (actual time=0.062..65234.210 rows=11566279 loops=1)
Filter: (labels.project_id IS NOT NULL)
Rows Removed by Filter: 264215
Buffers: shared hit=4128678 read=128411
I/O Timings: read=52000.066
-> Index Only Scan using index_lists_on_label_id on public.lists (cost=0.43..89979.39 rows=4591197 width=4) (actual time=0.050..3173.257 rows=1139508 loops=1)
Heap Fetches: 19501
Buffers: shared hit=55484 read=5275 dirtied=1
I/O Timings: read=2673.073
-> Index Only Scan using index_label_priorities_on_label_id on public.label_priorities (cost=0.29..2432.13 rows=111437 width=4) (actual time=0.031..187.849 rows=111436 loops=1)
Heap Fetches: 11500
Buffers: shared hit=6662 read=428
I/O Timings: read=96.496
-> Index Only Scan using index_resource_label_events_on_label_id_and_action on public.resource_label_events (cost=0.56..1050365.46 rows=49314016 width=4) (actual time=0.043..254400.561 rows=48269580 loops=1)
Heap Fetches: 812370
Buffers: shared hit=17705352 read=285110 dirtied=18
I/O Timings: read=229354.371
-> Index Only Scan using index_board_labels_on_label_id on public.board_labels (cost=0.29..367.74 rows=13837 width=4) (actual time=0.023..39.657 rows=13837 loops=1)
Heap Fetches: 4220
Buffers: shared hit=2537 read=60
I/O Timings: read=15.822
-> Index Only Scan using index_label_links_on_label_id on public.label_links (cost=0.56..1236166.56 rows=54520900 width=4) (actual time=0.030..225333.163 rows=54459126 loops=1)
Heap Fetches: 5092432
Buffers: shared hit=9064649 read=213273
I/O Timings: read=201427.286
partial duplicates: 518
CTE Scan on data (cost=3238456.18..3496740.50 rows=3826434 width=1169) (actual time=122774.202..151841.224 rows=518 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 11565761
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
CTE data
-> WindowAgg (cost=2980171.87..3238456.18 rows=11479303 width=93) (actual time=122630.119..141305.898 rows=11566279 loops=1)
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
-> Sort (cost=2980171.87..3008870.12 rows=11479303 width=85) (actual time=122630.064..127836.463 rows=11566279 loops=1)
Sort Key: labels.project_id, labels.title, labels.id
Sort Method: external merge Disk: 914000kB
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..574706.28 rows=11479303 width=85) (actual time=44.087..110733.988 rows=11566279 loops=1)
Index Cond: (labels.project_id IS NOT NULL)
Buffers: shared hit=4402385 read=132935 dirtied=56
I/O Timings: read=96180.352
Related to #30390 (closed)
Edited by charlie ablett