Migrate mentions for snippets and snippet notes to DB table
What does this MR do?
This MR contains migrations for snippets and snippet notes mentions.
These migrations are to be run after !19088 (merged) is merged and confirmed to be working fine on production
This MR is based on initial migration that contains common code under CreateResourceUserMention
to be used on all mentions migrations of other models, including Snippet
.
I've pulled some estimate counts of the records we might be looking at from slack database-lab
channel.
Database benchmarks
Number of rows affected
Snippets with mentions in description ~5023
EXPLAIN SELECT snippets.id FROM snippets WHERE (description like ‘%@%’ OR title like ‘%@%’)
Seq Scan on public.snippets (cost=0.00..175127.80 rows=2822 width=4) (actual time=3.076..17398.451 rows=5023 loops=1)
Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
Rows Removed by Filter: 232222
Buffers: shared hit=36911 read=57884 dirtied=134
I/O Timings: read=16210.023
Snippet notes with mentions ~ 2524
explain SELECT “notes”.“id” FROM “notes”
INNER JOIN snippets ON snippets.id = notes.noteable_id
LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id
WHERE (note LIKE ‘%@%’ AND snippet_user_mentions.snippet_id IS NULL
AND notes.noteable_type = ‘Snippet’)
Nested Loop (cost=0.84..4105.30 rows=1 width=4) (actual time=0.033..14.098 rows=2524 loops=1)
Buffers: shared hit=10577
-> Merge Join (cost=0.42..4103.45 rows=1 width=8) (actual time=0.025..5.729 rows=2524 loops=1)
Filter: (snippet_user_mentions.snippet_id IS NULL)
Rows Removed by Filter: 71
Buffers: shared hit=2558
-> Index Scan using s
[...SKIP...]
Migrate snippet mentions
Query plans for sql queries to migrate mentions for snippet description and title based on database-lab
data.
-
Query 1: Get the start ID for the snippets to be migrated.
explain SELECT "snippets"."id" FROM "snippets" LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id WHERE ((description like '%@%' OR title like '%@%') AND snippet_user_mentions.snippet_id is null) ORDER BY "snippets"."id" ASC LIMIT 1
Click to see the plan. Time: 31.771 ms. https://explain.depesz.com/s/JxgL
Limit (cost=0.56..23.82 rows=1 width=4) (actual time=29.629..29.631 rows=1 loops=1) Buffers: shared hit=12 read=26 I/O Timings: read=29.185 -> Merge Join (cost=0.56..94348.79 rows=4057 width=4) (actual time=29.628..29.628 rows=1 loops=1) Buffers: shared hit=12 read=26 I/O Timings: read=29.185 -> Index Scan using snippets_pkey on public.snippets (cost=0.42..94332.20 rows=4059 width=4) (actual time=29.514..29.515 rows=1 loops=1) Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text)) Rows Removed by Filter: 66 Buffers: shared hit=11 read=25 I/O Timings: read=29.156 -> Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions (cost=0.14..6.09 rows=130 width=4) (actual time=0.108..0.108 rows=1 loops=1) Heap Fetches: 0 Buffers: shared hit=1 read=1 I/O Timings: read=0.029 Time: 31.771 ms - planning: 2.096 ms - execution: 29.675 ms - I/O read: 29.185 ms - I/O write: 0.000 ms Shared buffers: hits: 12 (~96.00 KiB) from the buffer pool reads: 26 (~208.00 KiB) from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 2: Get max id for the 10K batch rows for snippets to be migrated.
explain SELECT "snippets"."id" FROM "snippets" LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id WHERE ((description like '%@%' OR title like '%@%') AND snippet_user_mentions.snippet_id is null) AND snippets.id > 1 ORDER BY "snippets"."id" ASC LIMIT 1 OFFSET 10000
Click to see the plan. Time: 987.942 ms. https://explain.depesz.com/s/Nks6
Limit (cost=94986.50..95009.92 rows=1 width=4) (actual time=986.219..986.219 rows=0 loops=1) Buffers: shared hit=224384 -> Merge Join (cost=0.56..94986.50 rows=4057 width=4) (actual time=0.119..985.637 rows=5140 loops=1) Buffers: shared hit=224384 -> Index Scan using snippets_pkey on public.snippets (cost=0.42..94969.92 rows=4059 width=4) (actual time=0.106..983.155 rows=5160 loops=1) Index Cond: (snippets.id > 1) Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text)) Rows Removed by Filter: 240993 Buffers: shared hit=224381 -> Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions (cost=0.14..6.09 rows=130 width=4) (actual time=0.009..0.081 rows=134 loops=1) Heap Fetches: 7 Buffers: shared hit=3 Time: 987.942 ms planning: 1.691 ms execution: 986.251 ms I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 224384 (~1.70 GiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 3: Get min and max ID for range of snippet mentions to be migrated.
explain SELECT MIN(snippets.id), MAX(snippets.id) FROM “snippets” LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id WHERE ((description like ‘%@%’ OR title like ‘%@%’) AND snippet_user_mentions.snippet_id is null) AND “snippets”.“id” >= 1 AND snippets.id < 2546154
Click to see the plan. Time: 1.351 s. https://explain.depesz.com/s/bBJS
Aggregate (cost=95644.50..95644.51 rows=1 width=8) (actual time=1348.630..1348.631 rows=1 loops=1) Buffers: shared hit=218497 read=5888 I/O Timings: read=143.230 -> Merge Join (cost=0.56..95624.22 rows=4057 width=4) (actual time=0.126..1345.679 rows=5140 loops=1) Buffers: shared hit=218497 read=5888 I/O Timings: read=143.230 -> Index Scan using snippets_pkey on public.snippets (cost=0.42..95607.63 rows=4059 width=4) (actual time=0.113..1341.985 rows=5160 loops=1) Index Cond: ((snippets.id >= 1) AND (snippets.id < 2546154)) Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text)) Rows Removed by Filter: 240994 Buffers: shared hit=218494 read=5888 I/O Timings: read=143.230 -> Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions (cost=0.14..6.09 rows=130 width=4) (actual time=0.010..0.128 rows=134 loops=1) Heap Fetches: 7 Buffers: shared hit=3 Time: 1.351 s planning: 2.310 ms execution: 1.349 s I/O read: 143.230 ms I/O write: 0.000 ms Shared buffers: hits: 218497 (~1.70 GiB) from the buffer pool reads: 5888 (~46.00 MiB) from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 4: Get actual snippets data for given range of ids
explain SELECT snippets.* FROM “snippets” LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id WHERE ((description like ‘%@%’ OR title like ‘%@%’) AND snippet_user_mentions.snippet_id is null) AND “snippets”.“id” >= 1 AND snippets.id < 2546154
Click to see the plan. Time: 1.051 s. https://explain.depesz.com/s/ZfDw
Merge Join (cost=0.56..95624.22 rows=4057 width=2099) (actual time=0.112..1045.890 rows=5140 loops=1) Buffers: shared hit=224385 -> Index Scan using snippets_pkey on public.snippets (cost=0.42..95607.63 rows=4059 width=2099) (actual time=0.098..1042.756 rows=5160 loops=1) Index Cond: ((snippets.id >= 1) AND (snippets.id < 2546154)) Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text)) Rows Removed by Filter: 240994 Buffers: shared hit=224382 -> Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions (cost=0.14..6.09 rows=130 width=4) (actual time=0.011..0.088 rows=134 loops=1) Heap Fetches: 7 Buffers: shared hit=3 Time: 1.051 s planning: 4.252 ms execution: 1.047 s I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 224385 (~1.70 GiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
Migrate snippet notes mentions
Query plans for sql queries to migrate mentions for snippet notes based on database-lab
data.
-
Creating a temporary index helps quite a bit:
exec CREATE INDEX CONCURRENTLY snippet_mentions_temp_index ON notes (id) WHERE note ~~ ‘%@%’::text AND notes.noteable_type = ‘Snippet’ The query has been executed. Duration: 63.628 min
\di+ snippet_mentions_temp_index
List of relations Schema | Name | Type | Owner | Table | Size | Description --------+----------------------------+-------+--------+-------+-------+------------- public | snippet_mentions_temp_index | index | gitlab | notes | 40 kB | (1 row)
-
We will need to drop the
snippet_mentions_temp_index
index in a different release after background migrations are finished.
Query plans for sql queries to migrate mentions for snippet notes based on database-lab
data.
-
Query 1: Get the start ID for the snippet notes to be migrated.
explain SELECT “notes”.“id” FROM “notes” INNER JOIN snippets ON snippets.id = notes.noteable_id LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id WHERE (note LIKE ‘%@%’ AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = ‘Snippet’) ORDER BY notes.id ASC LIMIT 1
Click to see the plan. Time: 3.595 ms. https://explain.depesz.com/s/TE1m
Limit (cost=0.84..4105.30 rows=1 width=4) (actual time=0.090..0.092 rows=1 loops=1) Buffers: shared hit=9 -> Nested Loop (cost=0.84..4105.30 rows=1 width=4) (actual time=0.089..0.089 rows=1 loops=1) Buffers: shared hit=9 -> Merge Join (cost=0.42..4103.45 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1) Filter: (snippet_user_mentions.snippet_id IS NULL) Rows Removed by Filter: 0 Buffers: shared hit=5 -> Index Scan using snippet_mentions_temp_index on public.notes (cost=0.28..4078.20 rows=5250 width=8) (actual time=0.017..0.017 rows=1 loops=1) Buffers: shared hit=3 -> Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions (cost=0.14..24.15 rows=130 width=8) (actual time=0.009..0.010 rows=1 loops=1) Buffers: shared hit=2 -> Index Only Scan using snippets_pkey on public.snippets (cost=0.42..1.84 rows=1 width=4) (actual time=0.054..0.054 rows=1 loops=1) Index Cond: (snippets.id = notes.noteable_id) Heap Fetches: 0 Buffers: shared hit=4 Time: 3.595 ms planning: 3.467 ms execution: 0.128 ms I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 9 (~72.00 KiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 2: Get max id for the 10K batch rows for snippet notes to be migrated.
explain SELECT “notes”.“id” FROM “notes” INNER JOIN snippets ON snippets.id = notes.noteable_id LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id WHERE (note LIKE ‘%@%’ AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = ‘Snippet’) AND notes.id >= 1 ORDER BY notes.id ASC LIMIT 1 OFFSET 10000
Click to see the plan. Time: 18.144 ms. https://explain.depesz.com/s/gKpFS
Limit (cost=4111.79..8222.74 rows=1 width=4) (actual time=14.617..14.617 rows=0 loops=1) Buffers: shared hit=10577 -> Nested Loop (cost=0.84..4111.79 rows=1 width=4) (actual time=0.040..14.432 rows=2524 loops=1) Buffers: shared hit=10577 -> Merge Join (cost=0.42..4109.94 rows=1 width=8) (actual time=0.031..6.180 rows=2524 loops=1) Filter: (snippet_user_mentions.snippet_id IS NULL) Rows Removed by Filter: 71 Buffers: shared hit=2558 -> Index Scan using snippet_mentions_temp_index on public.notes (cost=0.28..4084.68 rows=5250 width=8) (actual time=0.019..5.549 rows=2595 loops=1) Index Cond: (notes.id >= 1) Buffers: shared hit=2555 -> Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.030 rows=71 loops=1) Buffers: shared hit=3 -> Index Only Scan using snippets_pkey on public.snippets (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524) Index Cond: (snippets.id = notes.noteable_id) Heap Fetches: 8 Buffers: shared hit=8019 Time: 18.144 ms planning: 3.489 ms execution: 14.655 ms I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 10577 (~82.60 MiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 3: Get min and max for snippet notes mentions to be migrated.
explain SELECT min("notes"."id"), max(notes.id) FROM "notes" INNER JOIN snippets ON snippets.id = notes.noteable_id LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id WHERE (note LIKE '%@%' AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = 'Snippet') AND notes.id >= 1
Click to see the plan. Time: 18.140 ms. https://explain.depesz.com/s/uqkW
Aggregate (cost=4111.80..4111.81 rows=1 width=8) (actual time=14.571..14.572 rows=1 loops=1) Buffers: shared hit=10577 -> Nested Loop (cost=0.84..4111.79 rows=1 width=4) (actual time=0.039..14.084 rows=2524 loops=1) Buffers: shared hit=10577 -> Merge Join (cost=0.42..4109.94 rows=1 width=8) (actual time=0.030..6.076 rows=2524 loops=1) Filter: (snippet_user_mentions.snippet_id IS NULL) Rows Removed by Filter: 71 Buffers: shared hit=2558 -> Index Scan using snippet_mentions_temp_index on public.notes (cost=0.28..4084.68 rows=5250 width=8) (actual time=0.019..5.417 rows=2595 loops=1) Index Cond: (notes.id >= 1) Buffers: shared hit=2555 -> Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.032 rows=71 loops=1) Buffers: shared hit=3 -> Index Only Scan using snippets_pkey on public.snippets (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524) Index Cond: (snippets.id = notes.noteable_id) Heap Fetches: 8 Buffers: shared hit=8019 Time: 18.140 ms planning: 3.517 ms execution: 14.623 ms I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 10577 (~82.60 MiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
-
Query 4: Get actual snippet notes data for given range of ids
explain SELECT notes.* FROM "notes" INNER JOIN snippets ON snippets.id = notes.noteable_id LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id WHERE (note LIKE '%@%' AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = 'Snippet') AND notes.id >= 1
Click to see the plan. Time: 22.204 ms. https://explain.depesz.com/s/sd7w
Nested Loop (cost=0.84..4111.79 rows=1 width=2463) (actual time=0.040..14.679 rows=2524 loops=1) Buffers: shared hit=10577 -> Merge Join (cost=0.42..4109.94 rows=1 width=2463) (actual time=0.030..6.708 rows=2524 loops=1) Filter: (snippet_user_mentions.snippet_id IS NULL) Rows Removed by Filter: 71 Buffers: shared hit=2558 -> Index Scan using snippet_mentions_temp_index on public.notes (cost=0.28..4084.68 rows=5250 width=2463) (actual time=0.019..5.896 rows=2595 loops=1) Index Cond: (notes.id >= 1) Buffers: shared hit=2555 -> Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.041 rows=71 loops=1) Buffers: shared hit=3 -> Index Only Scan using snippets_pkey on public.snippets (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524) Index Cond: (snippets.id = notes.noteable_id) Heap Fetches: 8 Buffers: shared hit=8019 Time: 22.204 ms planning: 7.258 ms execution: 14.946 ms I/O read: 0.000 ms I/O write: 0.000 ms Shared buffers: hits: 10577 (~82.60 MiB) from the buffer pool reads: 0 from the OS file cache, including disk I/O dirtied: 0 writes: 0
Migration Runtime estimates:
Snippet title and description mentions: ~2-3 mins
- There are approximately 5023 records in snippets that need migration, which means a single background job, so it should take ~2-3mins given the delay of the scheduled job.
~60 mins
Snippet notes mentions migration:- There are approximately 2500 records, so it should take 1 background jobs with 10K batches which would take ~2-3mins to schedule and run. Also we need to create the index that can take ~60 mins ?
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
re #198323