Skip to content

Migrate mentions for snippets and snippet notes to DB table

What does this MR do?

This MR contains migrations for snippets and snippet notes mentions.

These migrations are to be run after !19088 (merged) is merged and confirmed to be working fine on production

This MR is based on initial migration that contains common code under CreateResourceUserMention to be used on all mentions migrations of other models, including Snippet.

I've pulled some estimate counts of the records we might be looking at from slack database-lab channel.

Database benchmarks

Number of rows affected

Snippets with mentions in description ~5023

EXPLAIN SELECT  snippets.id FROM snippets WHERE (description like %@% OR title like %@%)

Seq Scan on public.snippets  (cost=0.00..175127.80 rows=2822 width=4) (actual time=3.076..17398.451 rows=5023 loops=1)
   Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
   Rows Removed by Filter: 232222
   Buffers: shared hit=36911 read=57884 dirtied=134
   I/O Timings: read=16210.023

Snippet notes with mentions ~ 2524

explain SELECT notes.id FROM notes
 INNER JOIN snippets ON snippets.id = notes.noteable_id
 LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id
 WHERE (note LIKE %@% AND snippet_user_mentions.snippet_id IS NULL
 AND notes.noteable_type = Snippet)

Nested Loop  (cost=0.84..4105.30 rows=1 width=4) (actual time=0.033..14.098 rows=2524 loops=1)
Buffers: shared hit=10577
->  Merge Join  (cost=0.42..4103.45 rows=1 width=8) (actual time=0.025..5.729 rows=2524 loops=1)
      Filter: (snippet_user_mentions.snippet_id IS NULL)
      Rows Removed by Filter: 71
      Buffers: shared hit=2558
      ->  Index Scan using s
[...SKIP...]

Migrate snippet mentions

Query plans for sql queries to migrate mentions for snippet description and title based on database-lab data.

  • Query 1: Get the start ID for the snippets to be migrated.

     explain SELECT "snippets"."id" FROM "snippets"
     LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id
     WHERE ((description like '%@%' OR title like '%@%') AND snippet_user_mentions.snippet_id is null)
     ORDER BY "snippets"."id" ASC LIMIT 1

    Click to see the plan. Time: 31.771 ms. https://explain.depesz.com/s/JxgL
    
    Limit  (cost=0.56..23.82 rows=1 width=4) (actual time=29.629..29.631 rows=1 loops=1)
     Buffers: shared hit=12 read=26
     I/O Timings: read=29.185
     ->  Merge Join  (cost=0.56..94348.79 rows=4057 width=4) (actual time=29.628..29.628 rows=1 loops=1)
           Buffers: shared hit=12 read=26
           I/O Timings: read=29.185
           ->  Index Scan using snippets_pkey on public.snippets  (cost=0.42..94332.20 rows=4059 width=4) (actual time=29.514..29.515 rows=1 loops=1)
                 Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
                 Rows Removed by Filter: 66
                 Buffers: shared hit=11 read=25
                 I/O Timings: read=29.156
           ->  Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions  (cost=0.14..6.09 rows=130 width=4) (actual time=0.108..0.108 rows=1 loops=1)
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=1
                 I/O Timings: read=0.029
    Time: 31.771 ms
      - planning: 2.096 ms
      - execution: 29.675 ms
        - I/O read: 29.185 ms
        - I/O write: 0.000 ms
    Shared buffers:
    
    hits: 12 (~96.00 KiB) from the buffer pool
    reads: 26 (~208.00 KiB) from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 2: Get max id for the 10K batch rows for snippets to be migrated.

    explain SELECT "snippets"."id" FROM "snippets"
     LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id
     WHERE ((description like '%@%' OR title like '%@%') AND snippet_user_mentions.snippet_id is null) AND snippets.id > 1
     ORDER BY "snippets"."id" ASC LIMIT 1 OFFSET 10000

    Click to see the plan. Time: 987.942 ms. https://explain.depesz.com/s/Nks6
    
    Limit  (cost=94986.50..95009.92 rows=1 width=4) (actual time=986.219..986.219 rows=0 loops=1)
       Buffers: shared hit=224384
       ->  Merge Join  (cost=0.56..94986.50 rows=4057 width=4) (actual time=0.119..985.637 rows=5140 loops=1)
             Buffers: shared hit=224384
             ->  Index Scan using snippets_pkey on public.snippets  (cost=0.42..94969.92 rows=4059 width=4) (actual time=0.106..983.155 rows=5160 loops=1)
                   Index Cond: (snippets.id > 1)
                   Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
                   Rows Removed by Filter: 240993
                   Buffers: shared hit=224381
             ->  Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions  (cost=0.14..6.09 rows=130 width=4) (actual time=0.009..0.081 rows=134 loops=1)
                   Heap Fetches: 7
                   Buffers: shared hit=3
    Time: 987.942 ms
    
    planning: 1.691 ms
    execution: 986.251 ms
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 224384 (~1.70 GiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 3: Get min and max ID for range of snippet mentions to be migrated.

    explain SELECT MIN(snippets.id), MAX(snippets.id) FROM snippets
    LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id
    WHERE ((description like %@% OR title like %@%) AND snippet_user_mentions.snippet_id is null)
    AND snippets.id >= 1 AND snippets.id < 2546154

    Click to see the plan. Time: 1.351 s. https://explain.depesz.com/s/bBJS
    
    Aggregate  (cost=95644.50..95644.51 rows=1 width=8) (actual time=1348.630..1348.631 rows=1 loops=1)
       Buffers: shared hit=218497 read=5888
       I/O Timings: read=143.230
       ->  Merge Join  (cost=0.56..95624.22 rows=4057 width=4) (actual time=0.126..1345.679 rows=5140 loops=1)
             Buffers: shared hit=218497 read=5888
             I/O Timings: read=143.230
             ->  Index Scan using snippets_pkey on public.snippets  (cost=0.42..95607.63 rows=4059 width=4) (actual time=0.113..1341.985 rows=5160 loops=1)
                   Index Cond: ((snippets.id >= 1) AND (snippets.id < 2546154))
                   Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
                   Rows Removed by Filter: 240994
                   Buffers: shared hit=218494 read=5888
                   I/O Timings: read=143.230
             ->  Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions  (cost=0.14..6.09 rows=130 width=4) (actual time=0.010..0.128 rows=134 loops=1)
                   Heap Fetches: 7
                   Buffers: shared hit=3
    Time: 1.351 s
    
    planning: 2.310 ms
    execution: 1.349 s
    
    I/O read: 143.230 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 218497 (~1.70 GiB) from the buffer pool
    reads: 5888 (~46.00 MiB) from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 4: Get actual snippets data for given range of ids

    explain SELECT snippets.* FROM snippets
    LEFT JOIN snippet_user_mentions on snippets.id = snippet_user_mentions.snippet_id
    WHERE ((description like %@% OR title like %@%) AND snippet_user_mentions.snippet_id is null)
    AND snippets.id >= 1 AND snippets.id < 2546154

    Click to see the plan. Time: 1.051 s. https://explain.depesz.com/s/ZfDw
    
    Merge Join  (cost=0.56..95624.22 rows=4057 width=2099) (actual time=0.112..1045.890 rows=5140 loops=1)
       Buffers: shared hit=224385
       ->  Index Scan using snippets_pkey on public.snippets  (cost=0.42..95607.63 rows=4059 width=2099) (actual time=0.098..1042.756 rows=5160 loops=1)
             Index Cond: ((snippets.id >= 1) AND (snippets.id < 2546154))
             Filter: ((snippets.description ~~ '%@%'::text) OR ((snippets.title)::text ~~ '%@%'::text))
             Rows Removed by Filter: 240994
             Buffers: shared hit=224382
       ->  Index Only Scan using snippet_user_mentions_on_snippet_id_and_note_id_index on public.snippet_user_mentions  (cost=0.14..6.09 rows=130 width=4) (actual time=0.011..0.088 rows=134 loops=1)
             Heap Fetches: 7
             Buffers: shared hit=3
    Time: 1.051 s
    
    planning: 4.252 ms
    execution: 1.047 s
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 224385 (~1.70 GiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

Migrate snippet notes mentions

Query plans for sql queries to migrate mentions for snippet notes based on database-lab data.

  • Creating a temporary index helps quite a bit:

    exec CREATE INDEX CONCURRENTLY snippet_mentions_temp_index ON notes (id) 
    WHERE note ~~ %@%::text AND notes.noteable_type = Snippet
    
    The query has been executed. Duration: 63.628 min

    \di+ snippet_mentions_temp_index

    List of relations
    Schema |            Name            | Type  | Owner  | Table | Size  | Description 
    --------+----------------------------+-------+--------+-------+-------+-------------
    public | snippet_mentions_temp_index | index | gitlab | notes | 40 kB | 
    (1 row)
  • We will need to drop the snippet_mentions_temp_index index in a different release after background migrations are finished.

Query plans for sql queries to migrate mentions for snippet notes based on database-lab data.

  • Query 1: Get the start ID for the snippet notes to be migrated.

    explain SELECT notes.id FROM notes
      INNER JOIN snippets ON snippets.id = notes.noteable_id
      LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id
      WHERE (note LIKE %@% AND snippet_user_mentions.snippet_id IS NULL
      AND notes.noteable_type = Snippet) ORDER BY notes.id ASC LIMIT 1

    Click to see the plan. Time: 3.595 ms. https://explain.depesz.com/s/TE1m
    
    Limit  (cost=0.84..4105.30 rows=1 width=4) (actual time=0.090..0.092 rows=1 loops=1)
       Buffers: shared hit=9
       ->  Nested Loop  (cost=0.84..4105.30 rows=1 width=4) (actual time=0.089..0.089 rows=1 loops=1)
             Buffers: shared hit=9
             ->  Merge Join  (cost=0.42..4103.45 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)
                   Filter: (snippet_user_mentions.snippet_id IS NULL)
                   Rows Removed by Filter: 0
                   Buffers: shared hit=5
                   ->  Index Scan using snippet_mentions_temp_index on public.notes  (cost=0.28..4078.20 rows=5250 width=8) (actual time=0.017..0.017 rows=1 loops=1)
                         Buffers: shared hit=3
                   ->  Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions  (cost=0.14..24.15 rows=130 width=8) (actual time=0.009..0.010 rows=1 loops=1)
                         Buffers: shared hit=2
             ->  Index Only Scan using snippets_pkey on public.snippets  (cost=0.42..1.84 rows=1 width=4) (actual time=0.054..0.054 rows=1 loops=1)
                   Index Cond: (snippets.id = notes.noteable_id)
                   Heap Fetches: 0
                   Buffers: shared hit=4
    Time: 3.595 ms
    
    planning: 3.467 ms
    execution: 0.128 ms
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 9 (~72.00 KiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 2: Get max id for the 10K batch rows for snippet notes to be migrated.

    explain SELECT notes.id FROM notes
      INNER JOIN snippets ON snippets.id = notes.noteable_id
      LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id
      WHERE (note LIKE %@% AND snippet_user_mentions.snippet_id IS NULL
      AND notes.noteable_type = Snippet) AND notes.id >= 1 ORDER BY notes.id ASC LIMIT 1 OFFSET 10000

    Click to see the plan. Time: 18.144 ms. https://explain.depesz.com/s/gKpFS
    
    Limit  (cost=4111.79..8222.74 rows=1 width=4) (actual time=14.617..14.617 rows=0 loops=1)
       Buffers: shared hit=10577
       ->  Nested Loop  (cost=0.84..4111.79 rows=1 width=4) (actual time=0.040..14.432 rows=2524 loops=1)
             Buffers: shared hit=10577
             ->  Merge Join  (cost=0.42..4109.94 rows=1 width=8) (actual time=0.031..6.180 rows=2524 loops=1)
                   Filter: (snippet_user_mentions.snippet_id IS NULL)
                   Rows Removed by Filter: 71
                   Buffers: shared hit=2558
                   ->  Index Scan using snippet_mentions_temp_index on public.notes  (cost=0.28..4084.68 rows=5250 width=8) (actual time=0.019..5.549 rows=2595 loops=1)
                         Index Cond: (notes.id >= 1)
                         Buffers: shared hit=2555
                   ->  Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions  (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.030 rows=71 loops=1)
                         Buffers: shared hit=3
             ->  Index Only Scan using snippets_pkey on public.snippets  (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524)
                   Index Cond: (snippets.id = notes.noteable_id)
                   Heap Fetches: 8
                   Buffers: shared hit=8019
    Time: 18.144 ms
    
    planning: 3.489 ms
    execution: 14.655 ms
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 10577 (~82.60 MiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 3: Get min and max for snippet notes mentions to be migrated.

    explain SELECT min("notes"."id"), max(notes.id) FROM "notes"
    INNER JOIN snippets ON snippets.id = notes.noteable_id 
    LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id 
    WHERE (note LIKE '%@%' AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = 'Snippet') 
    AND notes.id >= 1

    Click to see the plan. Time: 18.140 ms. https://explain.depesz.com/s/uqkW
    
    Aggregate  (cost=4111.80..4111.81 rows=1 width=8) (actual time=14.571..14.572 rows=1 loops=1)
       Buffers: shared hit=10577
       ->  Nested Loop  (cost=0.84..4111.79 rows=1 width=4) (actual time=0.039..14.084 rows=2524 loops=1)
             Buffers: shared hit=10577
             ->  Merge Join  (cost=0.42..4109.94 rows=1 width=8) (actual time=0.030..6.076 rows=2524 loops=1)
                   Filter: (snippet_user_mentions.snippet_id IS NULL)
                   Rows Removed by Filter: 71
                   Buffers: shared hit=2558
                   ->  Index Scan using snippet_mentions_temp_index on public.notes  (cost=0.28..4084.68 rows=5250 width=8) (actual time=0.019..5.417 rows=2595 loops=1)
                         Index Cond: (notes.id >= 1)
                         Buffers: shared hit=2555
                   ->  Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions  (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.032 rows=71 loops=1)
                         Buffers: shared hit=3
             ->  Index Only Scan using snippets_pkey on public.snippets  (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524)
                   Index Cond: (snippets.id = notes.noteable_id)
                   Heap Fetches: 8
                   Buffers: shared hit=8019
    Time: 18.140 ms
    
    planning: 3.517 ms
    execution: 14.623 ms
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 10577 (~82.60 MiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

  • Query 4: Get actual snippet notes data for given range of ids

    explain SELECT notes.* FROM "notes"
    INNER JOIN snippets ON snippets.id = notes.noteable_id
    LEFT JOIN snippet_user_mentions ON notes.id = snippet_user_mentions.note_id
    WHERE (note LIKE '%@%' AND snippet_user_mentions.snippet_id IS NULL AND notes.noteable_type = 'Snippet')
    AND notes.id >= 1

    Click to see the plan. Time: 22.204 ms. https://explain.depesz.com/s/sd7w
    
    Nested Loop  (cost=0.84..4111.79 rows=1 width=2463) (actual time=0.040..14.679 rows=2524 loops=1)
       Buffers: shared hit=10577
       ->  Merge Join  (cost=0.42..4109.94 rows=1 width=2463) (actual time=0.030..6.708 rows=2524 loops=1)
             Filter: (snippet_user_mentions.snippet_id IS NULL)
             Rows Removed by Filter: 71
             Buffers: shared hit=2558
             ->  Index Scan using snippet_mentions_temp_index on public.notes  (cost=0.28..4084.68 rows=5250 width=2463) (actual time=0.019..5.896 rows=2595 loops=1)
                   Index Cond: (notes.id >= 1)
                   Buffers: shared hit=2555
             ->  Index Scan using index_snippet_user_mentions_on_note_id on public.snippet_user_mentions  (cost=0.14..24.15 rows=130 width=8) (actual time=0.008..0.041 rows=71 loops=1)
                   Buffers: shared hit=3
       ->  Index Only Scan using snippets_pkey on public.snippets  (cost=0.42..1.84 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=2524)
             Index Cond: (snippets.id = notes.noteable_id)
             Heap Fetches: 8
             Buffers: shared hit=8019
    Time: 22.204 ms
    
    planning: 7.258 ms
    execution: 14.946 ms
    
    I/O read: 0.000 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 10577 (~82.60 MiB) from the buffer pool
    reads: 0 from the OS file cache, including disk I/O
    dirtied: 0
    writes: 0
    

Migration Runtime estimates:

Snippet title and description mentions: ~2-3 mins
  • There are approximately 5023 records in snippets that need migration, which means a single background job, so it should take ~2-3mins given the delay of the scheduled job.
Snippet notes mentions migration: ~60 mins
  • There are approximately 2500 records, so it should take 1 background jobs with 10K batches which would take ~2-3mins to schedule and run. Also we need to create the index that can take ~60 mins ?

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

re #198323

Edited by Coung Ngo

Merge request reports

Loading