Skip to content

Backfill identifier_names to vulnerability_reads

Bala Kumar requested to merge 460080-new-backfill-migration into master

What does this MR do and why?

New backfill migration for vulnerability_reads.identifier_names as the old BBM is abandoned.

It is intended to be used for the filtering by identifier feature in #432419 (closed).

Changelog: other

Database

The estimate from db testing pipeline job for this migration to complete is available in !163088 (comment 2054374348).

Estimated Time to complete: 2 weeks, 3 days, 9 hours, 25 minutes, and 20 seconds

On production this is expected to finish sooner since we have migration optimization strategy.

We have set the Batch size to be 12000 and sub-batch size as 40 and with that we are getting:

Mean query time: 150.2 ms

Max query time: 12748.0 ms.

The max query time seems to be happen rarely when the DB is busy/congested and is not worth the effort for this one time migration task to further optimize.

Output of the post-deployment migration

bundle exec rake db:migrate:up:main VERSION=20241007115637

main: == [advisory_lock_connection] object_id: 128940, pg_backend_pid: 46825
main: == 20241007115637 QueueBackfillIdentifierNamesOfVulnerabilityReads: migrating =
main: == 20241007115637 QueueBackfillIdentifierNamesOfVulnerabilityReads: migrated (0.0772s)

main: == [advisory_lock_connection] object_id: 128940, pg_backend_pid: 46825

bundle exec rake db:migrate:down:main VERSION=20241007115637

main: == [advisory_lock_connection] object_id: 128920, pg_backend_pid: 47302
main: == 20241007115637 QueueBackfillIdentifierNamesOfVulnerabilityReads: reverting =
main: == 20241007115637 QueueBackfillIdentifierNamesOfVulnerabilityReads: reverted (0.0314s)

main: == [advisory_lock_connection] object_id: 128920, pg_backend_pid: 47302
Update query used in each sub-batch
UPDATE
  vulnerability_reads vr
SET
  identifier_names = selected_ids.names
FROM
  (
    WITH
      batched_relation AS MATERIALIZED (
        SELECT
          vulnerability_reads.*
        FROM
          vulnerability_reads
        WHERE
          vulnerability_reads.id BETWEEN 104513160
          AND 104513180
          AND vulnerability_reads.id >= 104513160
          AND vulnerability_reads.id < 104513201
        LIMIT
          40
      )
    SELECT
      vulnerability_reads.id AS id,
      array_agg(
        vi.name
        ORDER BY
          vi.name
      ) AS names
    FROM
      batched_relation vulnerability_reads
      JOIN vulnerability_occurrences vo ON vulnerability_reads.vulnerability_id = vo.vulnerability_id
      JOIN vulnerability_occurrence_identifiers voi ON vo.id = voi.occurrence_id
      JOIN vulnerability_identifiers vi ON voi.identifier_id = vi.id
    GROUP BY
      vulnerability_reads.id
  ) selected_ids
WHERE
  vr.id = selected_ids.id
 ModifyTable on public.vulnerability_reads vr  (cost=105.69..159.44 rows=0 width=0) (actual time=699.540..699.565 rows=0 loops=1)
   Buffers: shared hit=881 read=445 dirtied=82
   I/O Timings: read=688.751 write=0.000
   ->  Nested Loop  (cost=105.69..159.44 rows=15 width=102) (actual time=95.848..96.198 rows=7 loops=1)
         Buffers: shared hit=142 read=37 dirtied=3
         I/O Timings: read=94.390 write=0.000
         ->  Subquery Scan on selected_ids  (cost=105.12..105.67 rows=15 width=104) (actual time=95.826..96.061 rows=7 loops=1)
               Buffers: shared hit=107 read=37 dirtied=3
               I/O Timings: read=94.390 write=0.000
               ->  Aggregate  (cost=105.12..105.52 rows=15 width=40) (actual time=95.812..96.004 rows=7 loops=1)
                     Group Key: vulnerability_reads.id
                     Buffers: shared hit=107 read=37 dirtied=3
                     I/O Timings: read=94.390 write=0.000
                     CTE batched_relation
                       ->  Limit  (cost=0.57..24.42 rows=15 width=289) (actual time=14.061..16.777 rows=7 loops=1)
                             Buffers: shared hit=15 read=6
                             I/O Timings: read=16.610 write=0.000
                             ->  Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vulnerability_reads_1  (cost=0.57..24.42 rows=15 width=289) (actual time=14.057..16.764 rows=7 loops=1)
                                   Index Cond: ((vulnerability_reads_1.id >= 104513160) AND (vulnerability_reads_1.id <= 104513180) AND (vulnerability_reads_1.id >= 104513160) AND (vulnerability_reads_1.id < 104513201))
                                   Buffers: shared hit=15 read=6
                                   I/O Timings: read=16.610 write=0.000
                     ->  Sort  (cost=80.70..80.77 rows=28 width=29) (actual time=95.684..95.697 rows=9 loops=1)
                           Sort Key: vulnerability_reads.id
                           Sort Method: quicksort  Memory: 25kB
                           Buffers: shared hit=97 read=37 dirtied=3
                           I/O Timings: read=94.390 write=0.000
                           ->  Nested Loop  (cost=1.58..80.03 rows=28 width=29) (actual time=52.567..95.623 rows=9 loops=1)
                                 Buffers: shared hit=94 read=37 dirtied=3
                                 I/O Timings: read=94.390 write=0.000
                                 ->  Nested Loop  (cost=1.14..67.17 rows=28 width=16) (actual time=47.017..75.644 rows=9 loops=1)
                                       Buffers: shared hit=66 read=29 dirtied=3
                                       I/O Timings: read=74.653 write=0.000
                                       ->  Nested Loop  (cost=0.57..54.23 rows=15 width=16) (actual time=38.708..54.797 rows=7 loops=1)
                                             Buffers: shared hit=39 read=17
                                             I/O Timings: read=54.111 write=0.000
                                             ->  CTE Scan on batched_relation vulnerability_reads  (cost=0.00..0.30 rows=15 width=16) (actual time=14.068..16.822 rows=7 loops=1)
                                                   Buffers: shared hit=15 read=6
                                                   I/O Timings: read=16.610 write=0.000
                                             ->  Index Scan using index_vulnerability_occurrences_on_vulnerability_id on public.vulnerability_occurrences vo  (cost=0.57..3.58 rows=1 width=16) (actual time=5.417..5.419 rows=1 loops=7)
                                                   Index Cond: (vo.vulnerability_id = vulnerability_reads.vulnerability_id)
                                                   Buffers: shared hit=24 read=11
                                                   I/O Timings: read=37.501 write=0.000
                                       ->  Index Only Scan using index_vulnerability_occurrence_identifiers_on_unique_keys on public.vulnerability_occurrence_identifiers voi  (cost=0.57..0.76 rows=10 width=16) (actual time=2.399..2.970 rows=1 loops=7)
                                             Index Cond: (voi.occurrence_id = vo.id)
                                             Heap Fetches: 3
                                             Buffers: shared hit=27 read=12 dirtied=3
                                             I/O Timings: read=20.543 write=0.000
                                 ->  Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers vi  (cost=0.44..0.46 rows=1 width=29) (actual time=2.212..2.212 rows=1 loops=9)
                                       Index Cond: (vi.id = voi.identifier_id)
                                       Buffers: shared hit=28 read=8
                                       I/O Timings: read=19.737 write=0.000
         ->  Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr  (cost=0.57..3.58 rows=1 width=14) (actual time=0.013..0.013 rows=1 loops=7)
               Index Cond: (vr.id = selected_ids.id)
               Buffers: shared hit=35
               I/O Timings: read=0.000 write=0.000
Time: 711.209 ms
  - planning: 11.311 ms
  - execution: 699.898 ms
    - I/O read: 688.751 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 881 (~6.90 MiB) from the buffer pool
  - reads: 445 (~3.50 MiB) from the OS file cache, including disk I/O
  - dirtied: 82 (~656.00 KiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32350/commands/99928

Estimated Time to complete: 2 weeks, 3 days, 9 hours, 25 minutes, and 20 seconds
  • Interval: 160s
  • Total tuple count: 112712050
  • Max batch size: 0
  • Estimated seconds to complete: 1502720s
  • Estimated number of batches: 9392
  • Average batch time: 148.12s
  • Batch size: 12000
  • N. of batches sampled: 12
  • N. of failed batches: 0

Time estimation is conservative and based on sampling production data in a test environment. It represents the max time that migration could take. The actual time may differ from this estimation.

Related to #460080 (closed)

Edited by Bala Kumar

Merge request reports

Loading