Backfill Vulnerability merge request links
What does this MR do and why?
Related to Database 2/2: Add hasMergeRequest filter to Vul... (#421736 - closed).
Database review
This MR introduces a post-deployment migration to schedule the background jobs.
Output of the post-deployment migration
rake db:migrate:up
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: migrating =
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: migrated (0.1722s)
rake db:migrate:down
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: reverting =
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: reverted (0.0252s)
Update query used in background migration
UPDATE
vulnerability_reads
SET
has_merge_request = true
FROM
(SELECT "vulnerability_reads".* FROM "vulnerability_reads" INNER JOIN vulnerability_merge_request_links ON
vulnerability_reads.vulnerability_id =
vulnerability_merge_request_links.vulnerability_id WHERE "vulnerability_reads"."vulnerability_id" BETWEEN 1 AND 11513 AND "vulnerability_reads"."vulnerability_id" < 812) as sub_query
WHERE
vulnerability_reads.vulnerability_id = sub_query.vulnerability_id
ModifyTable on public.vulnerability_reads (cost=63.05..596.56 rows=1 width=183) (actual time=74.864..74.868 rows=0 loops=1)
Buffers: shared hit=69 read=71
I/O Timings: read=74.132 write=0.000
-> Nested Loop (cost=63.05..596.56 rows=1 width=183) (actual time=74.861..74.864 rows=0 loops=1)
Buffers: shared hit=69 read=71
I/O Timings: read=74.132 write=0.000
-> Merge Join (cost=62.49..592.98 rows=1 width=28) (actual time=74.860..74.861 rows=0 loops=1)
Merge Cond: (vulnerability_reads_1.vulnerability_id = vulnerability_merge_request_links.vulnerability_id)
Buffers: shared hit=69 read=71
I/O Timings: read=74.132 write=0.000
-> Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads vulnerability_reads_1 (cost=0.57..579.32 rows=686 width=14) (actual time=1.818..74.805 rows=199 loops=1)
Index Cond: ((vulnerability_reads_1.vulnerability_id >= 1) AND (vulnerability_reads_1.vulnerability_id <= 11513) AND (vulnerability_reads_1.vulnerability_id < 812))
Buffers: shared hit=67 read=71
I/O Timings: read=74.132 write=0.000
-> Index Scan using unique_vuln_merge_request_link_vuln_id_and_mr_id on public.vulnerability_merge_request_links (cost=0.15..14.65 rows=264 width=14) (actual time=0.012..0.012 rows=1 loops=1)
Buffers: shared hit=2
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.57..3.58 rows=1 width=170) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (vulnerability_reads.vulnerability_id = vulnerability_reads_1.vulnerability_id)
I/O Timings: read=0.000 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22008/commands/71210
Time estimations
As of writing, there are 79_709_487 records in the vulnerability_reads
table. This means that we will create 7_971 jobs with a gap of 2 minutes in between each, 7971 * 2 / 60 / 24 = ~11 days
.
Since the update query runs in around 80ms, with the sub batch size of 200, we will be running 50 update queries for each background job so 50 * 80 = 4 seconds
. With the additional overheads of executing each iteration, it's really likely to stay in 1 to 2 minutes for each job.
Also from sampling, DB testing job reports: Estimated Time to complete: 1 week, 3 days, 20 hours, and 34 minutes. Details in !130952 (comment 1550881529)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.