Data backfill migration for vulnerability_reads.owasp_top_10
What does this MR do and why?
This MR introduces a post-deployment migration to schedule the background jobs for backfilling older vulnerability_reads.owasp_top_10
records. For the new records ingestion is happening through !138594 (merged)
Note: Do not merge this one till the other !140067 (merged) is merged
Database
Output of the post-deployment migration
bundle exec rake db:migrate:up:main VERSION=20231214111617
main: == [advisory_lock_connection] object_id: 182100, pg_backend_pid: 14758
main: == 20231214111617 QueueBackfillOwaspTopTenOfVulnerabilityReads: migrating =====
main: == 20231214111617 QueueBackfillOwaspTopTenOfVulnerabilityReads: migrated (0.0567s)
main: == [advisory_lock_connection] object_id: 182100, pg_backend_pid: 14758
bundle exec rake db:migrate:down:main VERSION=20231214111617
main: == [advisory_lock_connection] object_id: 182620, pg_backend_pid: 14300
main: == 20231214111617 QueueBackfillOwaspTopTenOfVulnerabilityReads: reverting =====
main: == 20231214111617 QueueBackfillOwaspTopTenOfVulnerabilityReads: reverted (0.0450s)
main: == [advisory_lock_connection] object_id: 182620, pg_backend_pid: 14300
Select query used in background migration
SELECT vulnerability_reads.vulnerability_id FROM vulnerability_reads WHERE vulnerability_reads.vulnerability_id BETWEEN 30000 AND 40000 ORDER BY vulnerability_reads.vulnerability_id ASC LIMIT 10000
Limit (cost=0.57..759.44 rows=8674 width=8) (actual time=4.142..678.554 rows=10000 loops=1)
Buffers: shared hit=63 read=884 dirtied=309
I/O Timings: read=638.531 write=0.000
-> Index Only Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.57..759.44 rows=8674 width=8) (actual time=4.140..677.327 rows=10000 loops=1)
Index Cond: ((vulnerability_reads.vulnerability_id >= 30000) AND (vulnerability_reads.vulnerability_id <= 40000))
Heap Fetches: 2791
Buffers: shared hit=63 read=884 dirtied=309
I/O Timings: read=638.531 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25017/commands/79513
Update query used in each sub-batch
UPDATE
vulnerability_reads vr
SET
owasp_top_10 = CASE selected_ids.external_id WHEN 'A1:2017-Injection' THEN 1 WHEN 'A1:2017' THEN 1 WHEN 'A2:2017-Broken Authentication' THEN 2 WHEN 'A2:2017' THEN 2 WHEN 'A3:2017-Sensitive Data Exposure' THEN 3 WHEN 'A3:2017' THEN 3 WHEN 'A4:2017-XML External Entities (XXE)' THEN 4 WHEN 'A4:2017' THEN 4 WHEN 'A5:2017-Broken Access Control' THEN 5 WHEN 'A5:2017' THEN 5 WHEN 'A6:2017-Security Misconfiguration' THEN 6 WHEN 'A6:2017' THEN 6 WHEN 'A7:2017-Cross-Site Scripting (XSS)' THEN 7 WHEN 'A7:2017' THEN 7 WHEN 'A8:2017-Insecure Deserialization' THEN 8 WHEN 'A8:2017' THEN 8 WHEN 'A9:2017-Using Components with Known Vulnerabilities' THEN 9 WHEN 'A9:2017' THEN 9 WHEN 'A10:2017-Insufficient Logging & Monitoring' THEN 10 WHEN 'A10:2017' THEN 10 WHEN 'A1:2021-Broken Access Control' THEN 11 WHEN 'A1:2021' THEN 11 WHEN 'A2:2021-Cryptographic Failures' THEN 12 WHEN 'A2:2021' THEN 12 WHEN 'A3:2021-Injection' THEN 13 WHEN 'A3:2021' THEN 13 WHEN 'A4:2021-Insecure Design' THEN 14 WHEN 'A4:2021' THEN 14 WHEN 'A5:2021-Security Misconfiguration' THEN 15 WHEN 'A5:2021' THEN 15 WHEN 'A6:2021-Vulnerable and Outdated Components' THEN 16 WHEN 'A6:2021' THEN 16 WHEN 'A7:2021-Identification and Authentication Failures' THEN 17 WHEN 'A7:2021' THEN 17 WHEN 'A8:2021-Software and Data Integrity Failures' THEN 18 WHEN 'A8:2021' THEN 18 WHEN 'A9:2021-Security Logging and Monitoring Failures' THEN 19 WHEN 'A9:2021' THEN 19 WHEN 'A10:2021-Server-Side Request Forgery' THEN 20 WHEN 'A10:2021' THEN 20 ELSE vr.owasp_top_10 END
FROM
(
SELECT
vr.id,
vi.external_id
FROM
vulnerability_reads vr
JOIN vulnerability_occurrences vo ON vr.vulnerability_id = vo.vulnerability_id
JOIN vulnerability_occurrence_identifiers voi ON vo.id = voi.occurrence_id
JOIN vulnerability_identifiers vi ON voi.identifier_id = vi.id
WHERE
(
LOWER(vi.external_type) = 'owasp'
)
AND "vi"."external_id" IN (
'A1:2017-Injection', 'A1:2017', 'A2:2017-Broken Authentication',
'A2:2017', 'A3:2017-Sensitive Data Exposure',
'A3:2017', 'A4:2017-XML External Entities (XXE)',
'A4:2017', 'A5:2017-Broken Access Control',
'A5:2017', 'A6:2017-Security Misconfiguration',
'A6:2017', 'A7:2017-Cross-Site Scripting (XSS)',
'A7:2017', 'A8:2017-Insecure Deserialization',
'A8:2017', 'A9:2017-Using Components with Known Vulnerabilities',
'A9:2017', 'A10:2017-Insufficient Logging & Monitoring',
'A10:2017', 'A1:2021-Broken Access Control',
'A1:2021', 'A2:2021-Cryptographic Failures',
'A2:2021', 'A3:2021-Injection',
'A3:2021', 'A4:2021-Insecure Design',
'A4:2021', 'A5:2021-Security Misconfiguration',
'A5:2021', 'A6:2021-Vulnerable and Outdated Components',
'A6:2021', 'A7:2021-Identification and Authentication Failures',
'A7:2021', 'A8:2021-Software and Data Integrity Failures',
'A8:2021', 'A9:2021-Security Logging and Monitoring Failures',
'A9:2021', 'A10:2021-Server-Side Request Forgery',
'A10:2021'
)
AND vr.id IN (
SELECT
vulnerability_reads.id
FROM
vulnerability_reads
WHERE
vulnerability_reads.vulnerability_id BETWEEN 100050
AND 100100
AND vulnerability_reads.vulnerability_id >= 100050
AND vulnerability_reads.vulnerability_id < 100100
)
) selected_ids
WHERE
vr.id = selected_ids.id
ModifyTable on public.vulnerability_reads vr (cost=3.38..319.81 rows=0 width=0) (actual time=210.679..210.684 rows=0 loops=1)
Buffers: shared hit=774 read=71 dirtied=3
I/O Timings: read=207.446 write=0.000
-> Nested Loop (cost=3.38..319.81 rows=1 width=38) (actual time=210.676..210.680 rows=0 loops=1)
Buffers: shared hit=774 read=71 dirtied=3
I/O Timings: read=207.446 write=0.000
-> Nested Loop (cost=2.81..319.09 rows=1 width=67) (actual time=210.675..210.678 rows=0 loops=1)
Buffers: shared hit=774 read=71 dirtied=3
I/O Timings: read=207.446 write=0.000
-> Nested Loop (cost=2.27..277.25 rows=74 width=48) (actual time=44.453..176.757 rows=42 loops=1)
Buffers: shared hit=619 read=58 dirtied=3
I/O Timings: read=174.232 write=0.000
-> Nested Loop (cost=1.70..244.24 rows=43 width=42) (actual time=34.308..124.971 rows=42 loops=1)
Buffers: shared hit=425 read=42 dirtied=3
I/O Timings: read=123.277 write=0.000
-> Nested Loop (cost=1.14..206.11 rows=43 width=36) (actual time=19.920..83.524 rows=42 loops=1)
Buffers: shared hit=227 read=30 dirtied=3
I/O Timings: read=82.404 write=0.000
-> Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.57..51.95 rows=43 width=14) (actual time=16.023..79.078 rows=42 loops=1)
Index Cond: ((vulnerability_reads.vulnerability_id >= 100050) AND (vulnerability_reads.vulnerability_id <= 100100) AND (vulnerability_reads.vulnerability_id >= 100050) AND (vulnerability_reads.vulnerability_id < 100100))
Buffers: shared hit=18 read=29 dirtied=3
I/O Timings: read=78.570 write=0.000
-> Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr_1 (cost=0.57..3.58 rows=1 width=22) (actual time=0.102..0.102 rows=1 loops=42)
Index Cond: (vr_1.id = vulnerability_reads.id)
Buffers: shared hit=209 read=1
I/O Timings: read=3.834 write=0.000
-> Index Scan using index_vulnerability_occurrences_on_vulnerability_id on public.vulnerability_occurrences vo (cost=0.57..0.88 rows=1 width=22) (actual time=0.984..0.985 rows=1 loops=42)
Index Cond: (vo.vulnerability_id = vr_1.vulnerability_id)
Buffers: shared hit=198 read=12
I/O Timings: read=40.873 write=0.000
-> Index Scan using index_vulnerability_occurrence_identifiers_on_unique_keys on public.vulnerability_occurrence_identifiers voi (cost=0.57..0.71 rows=6 width=22) (actual time=1.230..1.231 rows=1 loops=42)
Index Cond: (voi.occurrence_id = vo.id)
Buffers: shared hit=194 read=16
I/O Timings: read=50.955 write=0.000
-> Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers vi (cost=0.54..0.57 rows=1 width=35) (actual time=0.805..0.805 rows=0 loops=42)
Index Cond: (vi.id = voi.identifier_id)
Filter: (((vi.external_id)::text = ANY ('{A1:2017-Injection,A1:2017,"A2:2017-Broken Authentication",A2:2017,"A3:2017-Sensitive Data Exposure",A3:2017,"A4:2017-XML External Entities (XXE)",A4:2017,"A5:2017-Broken Access Control",A5:2017,"A6:2017-Security Misconfiguration",A6:2017,"A7:2017-Cross-Site Scripting (XSS)",A7:2017,"A8:2017-Insecure Deserialization",A8:2017,"A9:2017-Using Components with Known Vulnerabilities",A9:2017,"A10:2017-Insufficient Logging & Monitoring",A10:2017,"A1:2021-Broken Access Control",A1:2021,"A2:2021-Cryptographic Failures",A2:2021,A3:2021-Injection,A3:2021,"A4:2021-Insecure Design",A4:2021,"A5:2021-Security Misconfiguration",A5:2021,"A6:2021-Vulnerable and Outdated Components",A6:2021,"A7:2021-Identification and Authentication Failures",A7:2021,"A8:2021-Software and Data Integrity Failures",A8:2021,"A9:2021-Security Logging and Monitoring Failures",A9:2021,"A10:2021-Server-Side Request Forgery",A10:2021}'::text[])) AND (lower((vi.external_type)::text) = 'owasp'::text))
Rows Removed by Filter: 1
Buffers: shared hit=155 read=13
I/O Timings: read=33.214 write=0.000
-> Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr (cost=0.57..0.62 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (vr.id = vr_1.id)
I/O Timings: read=0.000 write=0.000
Time: 225.270 ms
- planning: 14.283 ms
- execution: 210.987 ms
- I/O read: 207.446 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 774 (~6.00 MiB) from the buffer pool
- reads: 71 (~568.00 KiB) from the OS file cache, including disk I/O
- dirtied: 3 (~24.00 KiB)
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25017/commands/79523
Estimated Time to complete: 1 week, 4 days, 19 hours, and 16 minutes
- Interval: 120s
- Total tuple count: 84984390
- Max batch size: 0
- Estimated seconds to complete: 1019760s
- Estimated number of batches: 8498
- Average batch time: 53.83s
- Batch size: 10000
- N. of batches sampled: 33
- N. of failed batches: 0
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #419092 (closed)