Save iteration sequence number (!74352) · Merge requests · GitLab.org / GitLab

euko requested to merge 343621-separately-store-the-numbering-of-iterations into master Nov 12, 2021

What does this MR do and why?

Part 1/2 of #343621 (closed) (Part 2: !75527 (closed))

DB change: Add a new column sequence to sprints table
DB change: Add a deferrable unique constraint to sprints table on iterations_cadence_id and sequence (the enforcement of the constraint is deferred until the end of a transaction to allow for a bulk update. See !74352 (comment 749969086))
DB update: Backfills sequence column
App model change: on creation, update or deletion of an iteration, the sequence numbers are checked/updated for all the iterations in the cadence containing the created/updated/deleted iteration.

Iteration cadence is being developed behind a FF :iteration_cadences

Background

An iteration cadence is a container for a sequence of iterations (i.e., an iteration cadence has many iterations and a group may have many iteration cadences.)

An iteration has a duration (period) and the iterations within a cadence are strictly sequential without overlapping periods.

While either the start or due date of an iteration can be used to derive the iteration's sequence number in a cadence, we've decided to explicitly store the sequence number in the sprints table for easier and performant referencing (useful for searching and generating default titles).

Illustration:

Cadence	Iteration title	Start Date	End Date	Sequence Number
Cadence A	Iteration 1	Nov 1	Nov 5	1
"	Foobar	Nov 6	Nov 10	2
"	Iteration 3	Nov 11	Nov 15	3
Cadence B	Iteration 1	Nov 3	Nov 10	1

Migration

Up

== 20211126042235 AddSequenceColumnToSprintsTable: migrating ==================
-- add_column(:sprints, :sequence, :integer)
   -> 0.0015s
-- execute("ALTER TABLE sprints ADD CONSTRAINT sequence_is_unique_per_iterations_cadence_id UNIQUE (iterations_cadence_id, sequence) DEFERRABLE INITIALLY DEFERRED")
   -> 0.0023s
== 20211126042235 AddSequenceColumnToSprintsTable: migrated (0.0040s) =========


== 20211130165043 BackfillSequenceColumnForSprintsTable: migrating ============
-- execute("        UPDATE sprints\n        SET sequence=t.row_number\n        FROM (\n          SELECT id, row_number() OVER (PARTITION BY iterations_cadence_id ORDER BY start_date)\n          FROM sprints as s1\n          WHERE s1.iterations_cadence_id IS NOT NULL\n        ) as t\n        WHERE t.id=sprints.id AND (sprints.sequence IS NULL OR sprints.sequence <> t.row_number)\n")
   -> 0.0022s
== 20211130165043 BackfillSequenceColumnForSprintsTable: migrated (0.0023s) ===

Query plan for the backfill migration

QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on sprints  (cost=4177.53..4747.18 rows=16142 width=438) (actual time=3155.937..3155.940 rows=0 loops=1)
   Buffers: shared hit=653237 read=1856 dirtied=4159 written=2046
   I/O Timings: read=733.711
   ->  Hash Join  (cost=4177.53..4747.18 rows=16142 width=438) (actual time=56.372..118.406 rows=16223 loops=1)
         Hash Cond: (t.id = sprints.id)
         Join Filter: ((sprints.sequence IS NULL) OR (sprints.sequence <> t.row_number))
         Buffers: shared hit=29547 read=193 dirtied=28
         I/O Timings: read=11.324
         ->  Subquery Scan on t  (cost=2397.56..2884.25 rows=16223 width=56) (actual time=19.436..56.130 rows=16223 loops=1)
               Buffers: shared hit=14172 read=64
               I/O Timings: read=1.236
               ->  WindowAgg  (cost=2397.56..2722.02 rows=16223 width=24) (actual time=19.424..46.442 rows=16223 loops=1)
                     Buffers: shared hit=14172 read=64
                     I/O Timings: read=1.236
                     ->  Sort  (cost=2397.56..2438.12 rows=16223 width=16) (actual time=19.300..23.036 rows=16223 loops=1)
                           Sort Key: s1.iterations_cadence_id, s1.start_date
                           Sort Method: quicksort  Memory: 1145kB
                           Buffers: shared hit=14172 read=64
                           I/O Timings: read=1.236
                           ->  Index Scan using sequence_is_unique_per_iterations_cadence_id on sprints s1  (cost=0.29..1263.11 rows=16223 width=16) (actual time=0.057..12.398 rows=16223 loops=1)
                                 Index Cond: (iterations_cadence_id IS NOT NULL)
                                 Buffers: shared hit=14169 read=64
                                 I/O Timings: read=1.236
         ->  Hash  (cost=1577.18..1577.18 rows=16223 width=394) (actual time=36.809..36.809 rows=16223 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 4530kB
               Buffers: shared hit=15372 read=129 dirtied=28
               I/O Timings: read=10.087
               ->  Index Scan using sprints_pkey on sprints  (cost=0.29..1577.18 rows=16223 width=394) (actual time=0.008..25.605 rows=16223 loops=1)
                     Buffers: shared hit=15372 read=129 dirtied=28
                     I/O Timings: read=10.087
 Planning Time: 42.524 ms
 Execution Time: 3156.431 ms
(32 rows)

Down

== 20211130165043 BackfillSequenceColumnForSprintsTable: reverting ============
== 20211130165043 BackfillSequenceColumnForSprintsTable: reverted (0.0000s) ===


== 20211126042235 AddSequenceColumnToSprintsTable: reverting ==================
-- remove_column(:sprints, :sequence, :integer)
   -> 0.0029s
== 20211126042235 AddSequenceColumnToSprintsTable: reverted (0.0065s) =========

Query plans

Updating sequence numbers for iterations within a cadence

sample query:

The sample query updates a cadence containing 128 iterations. (the cadence belongs to a GitLab team member. SELECT COUNT(id), iterations_cadence_id FROM sprints GROUP BY iterations_cadence_id ORDER BY 1 DESC LIMIT 1;)

UPDATE sprints SET sequence=t.row_number
FROM ( 
SELECT id, row_number() OVER (ORDER BY start_date) FROM sprints 
WHERE iterations_cadence_id = 913
) as t
WHERE t.id=sprints.id AND (sprints.sequence IS DISTINCT FROM t.row_number);

query plan:

QUERY PLAN                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on sprints  (cost=172.19..519.06 rows=127 width=438) (actual time=48.964..48.968 rows=0 loops=1)
   Buffers: shared hit=4738 read=248 dirtied=298 written=28
   I/O Timings: read=11.886
   ->  Nested Loop  (cost=172.19..519.06 rows=127 width=438) (actual time=0.384..2.543 rows=128 loops=1)
         Buffers: shared hit=445 read=29 dirtied=4
         I/O Timings: read=0.964
         ->  Subquery Scan on t  (cost=171.90..175.42 rows=128 width=56) (actual time=0.332..0.780 rows=128 loops=1)
               Buffers: shared hit=88 read=2 dirtied=4
               I/O Timings: read=0.061
               ->  WindowAgg  (cost=171.90..174.14 rows=128 width=20) (actual time=0.318..0.641 rows=128 loops=1)
                     Buffers: shared hit=88 read=2 dirtied=4
                     I/O Timings: read=0.061
                     ->  Sort  (cost=171.90..172.22 rows=128 width=12) (actual time=0.310..0.358 rows=128 loops=1)
                           Sort Key: sprints_1.start_date
                           Sort Method: quicksort  Memory: 31kB
                           Buffers: shared hit=88 read=2 dirtied=4
                           I/O Timings: read=0.061
                           ->  Index Scan using index_sprints_iterations_cadence_id on sprints sprints_1  (cost=0.29..167.42 rows=128 width=12) (actual time=0.074..0.266 rows=128 loops=1)
                                 Index Cond: (iterations_cadence_id = 913)
                                 Buffers: shared hit=85 read=2 dirtied=4
                                 I/O Timings: read=0.061
         ->  Index Scan using sprints_pkey on sprints  (cost=0.29..2.67 rows=1 width=394) (actual time=0.012..0.012 rows=1 loops=128)
               Index Cond: (id = t.id)
               Filter: (sequence IS DISTINCT FROM t.row_number)
               Buffers: shared hit=357 read=27
               I/O Timings: read=0.902
 Planning Time: 15.654 ms
 Execution Time: 49.360 ms
(28 rows)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Dec 10, 2021 by euko

Save iteration sequence number