Skip to content

Resolve labels with duplicate title and project IDs

What does this MR do?

Step 2 of #30390 (comment 254445189)

This MR has multiple migrations for resolving duplicate labels that were introduced.

Strategy has to consider self-hosted customers, as well as anyone on gitlab.com.

Migration outputs

$ rake db:migrate:up VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: migrating =============================
-- execute("CREATE TABLE backup_labels (LIKE labels INCLUDING ALL);")
   -> 0.0222s
-- execute("ALTER TABLE backup_labels ALTER COLUMN ID DROP DEFAULT;")
   -> 0.0004s
-- execute("ALTER TABLE backup_labels ADD COLUMN restore_action INTEGER;")
   -> 0.0006s
-- execute("ALTER TABLE backup_labels ADD COLUMN new_title VARCHAR;")
   -> 0.0003s
== 20200305020458 AddLabelRestoreTable: migrated (0.0238s) ====================
$ rake db:migrate:down VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: reverting =============================
-- drop_table(:backup_labels)
   -> 0.0171s
== 20200305020458 AddLabelRestoreTable: reverted (0.0171s) ====================
$ rake db:migrate:up VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: migrating =======================
-- transaction_open?()
   -> 0.0000s
-- foreign_keys(:backup_labels)
   -> 0.0015s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_7de4989a69\nFOREIGN KEY (project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
   -> 0.0053s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_7de4989a69;")
   -> 0.0063s
-- transaction_open?()
   -> 0.0000s
-- foreign_keys(:backup_labels)
   -> 0.0021s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_rails_c1ac5161d8\nFOREIGN KEY (group_id)\nREFERENCES namespaces (id)\nON DELETE CASCADE\nNOT VALID;\n")
   -> 0.0021s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_rails_c1ac5161d8;")
   -> 0.0091s
== 20200305020459 AddLabelRestoreForeignKeys: migrated (0.0413s) ==============
$ rake db:migrate:down VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: reverting =======================
-- remove_foreign_key(:backup_labels, {:name=>"fk_7de4989a69"})
   -> 0.0041s
-- remove_foreign_key(:backup_labels, {:name=>"fk_rails_c1ac5161d8"})
   -> 0.0039s
== 20200305020459 AddLabelRestoreForeignKeys: reverted (0.0221s) ==============
$ rake db:migrate:up VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: migrating =================
== 20200305082754 RemoveDuplicateLabelsFromProject: migrated (0.0951s) ========
$ rake db:migrate:down VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: reverting =================
== 20200305082754 RemoveDuplicateLabelsFromProject: reverted (0.0056s) ========
$ be rails db:migrate:up VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrating =========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
   -> 0.0057s
-- add_index(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
   -> 0.0090s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
   -> 0.0043s
-- remove_index(:labels, {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently, :column=>[:project_id, :title]})
   -> 0.0099s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrated (0.0301s) 
$ be rails db:migrate:down VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverting =========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>false, :name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
   -> 0.0048s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
   -> 0.0026s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverted (0.0077s) 

As of 7/7/2020 using queries in !21384 (comment 354329489):

Full duplicates: 305 full duplicates
CTE Scan on data  (cost=6876482.76..7112371.68 rows=3494651 width=1169) (actual time=593254.903..618840.515 rows=305 loops=1)
   Filter: (data.row_number > 1)
   Rows Removed by Filter: 9223143
   Buffers: shared hit=30963370 read=632557 dirtied=19
   I/O Timings: read=485567.114
   CTE data
     ->  WindowAgg  (cost=6535754.32..6876482.76 rows=10483952 width=93) (actual time=592912.920..609623.229 rows=9223448 loops=1)
           Buffers: shared hit=30963370 read=632557 dirtied=19
           I/O Timings: read=485567.114
           ->  Sort  (cost=6535754.32..6561964.20 rows=10483952 width=85) (actual time=592912.848..597381.059 rows=9223448 loops=1)
                 Sort Key: labels.project_id, labels.title, labels.template, labels.description, labels.type, labels.color, labels.id
                 Sort Method: external merge  Disk: 730536kB
                 Buffers: shared hit=30963370 read=632557 dirtied=19
                 I/O Timings: read=485567.114
                 ->  Merge Anti Join  (cost=2.57..4345719.49 rows=10483952 width=85) (actual time=171.499..580262.662 rows=9223448 loops=1)
                       Merge Cond: (labels.id = label_links.label_id)
                       Buffers: shared hit=30963362 read=632557 dirtied=19
                       I/O Timings: read=485567.114
                       ->  Merge Anti Join  (cost=2.01..2455553.43 rows=10660104 width=85) (actual time=0.420..344855.708 rows=9949673 loops=1)
                             Merge Cond: (labels.id = board_labels.label_id)
                             Buffers: shared hit=21898713 read=419284 dirtied=19
                             I/O Timings: read=284139.828
                             ->  Merge Anti Join  (cost=1.72..2428352.58 rows=10670229 width=85) (actual time=0.395..341396.929 rows=9950982 loops=1)
                                   Merge Cond: (labels.id = resource_label_events.label_id)
                                   Buffers: shared hit=21896176 read=419224 dirtied=19
                                   I/O Timings: read=284124.006
                                   ->  Merge Anti Join  (cost=1.16..864817.36 rows=10862671 width=85) (actual time=0.350..77571.231 rows=10697595 loops=1)
                                         Merge Cond: (labels.id = label_priorities.label_id)
                                         Buffers: shared hit=4190824 read=134114 dirtied=1
                                         I/O Timings: read=54769.635
                                         ->  Merge Anti Join  (cost=0.87..833679.69 rows=10957388 width=85) (actual time=0.317..73649.101 rows=10763349 loops=1)
                                               Merge Cond: (labels.id = lists.label_id)
                                               Buffers: shared hit=4184162 read=133686 dirtied=1
                                               I/O Timings: read=54673.139
                                               ->  Index Scan using labels_pkey on public.labels  (cost=0.43..763477.04 rows=11479303 width=85) (actual time=0.062..65234.210 rows=11566279 loops=1)
                                                     Filter: (labels.project_id IS NOT NULL)
                                                     Rows Removed by Filter: 264215
                                                     Buffers: shared hit=4128678 read=128411
                                                     I/O Timings: read=52000.066
                                               ->  Index Only Scan using index_lists_on_label_id on public.lists  (cost=0.43..89979.39 rows=4591197 width=4) (actual time=0.050..3173.257 rows=1139508 loops=1)
                                                     Heap Fetches: 19501
                                                     Buffers: shared hit=55484 read=5275 dirtied=1
                                                     I/O Timings: read=2673.073
                                         ->  Index Only Scan using index_label_priorities_on_label_id on public.label_priorities  (cost=0.29..2432.13 rows=111437 width=4) (actual time=0.031..187.849 rows=111436 loops=1)
                                               Heap Fetches: 11500
                                               Buffers: shared hit=6662 read=428
                                               I/O Timings: read=96.496
                                   ->  Index Only Scan using index_resource_label_events_on_label_id_and_action on public.resource_label_events  (cost=0.56..1050365.46 rows=49314016 width=4) (actual time=0.043..254400.561 rows=48269580 loops=1)
                                         Heap Fetches: 812370
                                         Buffers: shared hit=17705352 read=285110 dirtied=18
                                         I/O Timings: read=229354.371
                             ->  Index Only Scan using index_board_labels_on_label_id on public.board_labels  (cost=0.29..367.74 rows=13837 width=4) (actual time=0.023..39.657 rows=13837 loops=1)
                                   Heap Fetches: 4220
                                   Buffers: shared hit=2537 read=60
                                   I/O Timings: read=15.822
                       ->  Index Only Scan using index_label_links_on_label_id on public.label_links  (cost=0.56..1236166.56 rows=54520900 width=4) (actual time=0.030..225333.163 rows=54459126 loops=1)
                             Heap Fetches: 5092432
                             Buffers: shared hit=9064649 read=213273
                             I/O Timings: read=201427.286
partial duplicates: 518
 CTE Scan on data  (cost=3238456.18..3496740.50 rows=3826434 width=1169) (actual time=122774.202..151841.224 rows=518 loops=1)
   Filter: (data.row_number > 1)
   Rows Removed by Filter: 11565761
   Buffers: shared hit=4402391 read=132935 dirtied=56
   I/O Timings: read=96180.352
   CTE data
     ->  WindowAgg  (cost=2980171.87..3238456.18 rows=11479303 width=93) (actual time=122630.119..141305.898 rows=11566279 loops=1)
           Buffers: shared hit=4402391 read=132935 dirtied=56
           I/O Timings: read=96180.352
           ->  Sort  (cost=2980171.87..3008870.12 rows=11479303 width=85) (actual time=122630.064..127836.463 rows=11566279 loops=1)
                 Sort Key: labels.project_id, labels.title, labels.id
                 Sort Method: external merge  Disk: 914000kB
                 Buffers: shared hit=4402391 read=132935 dirtied=56
                 I/O Timings: read=96180.352
                 ->  Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels  (cost=0.56..574706.28 rows=11479303 width=85) (actual time=44.087..110733.988 rows=11566279 loops=1)
                       Index Cond: (labels.project_id IS NOT NULL)
                       Buffers: shared hit=4402385 read=132935 dirtied=56
                       I/O Timings: read=96180.352

Related to #30390 (closed)

Edited by charlie ablett

Merge request reports

Loading