Populate `finding_uuid` attribute for vulnerability_feedback
What does this MR do?
Related to #277134 (closed).
Database review
The post-deployment migration
There is a post-deployment migration to schedule the background jobs for all vulnerability_feedback
records in a batch of 1k. It is using the EachBatch
module which will iterate through all records by the primary key of the table.
rake db:migrate
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrating
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrated (0.0502s)
rake db:migrate:down
This is no-op
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverting
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverted (0.0000s)
Background job
The background job is first loading all the feedback
records into memory by given boundaries and then trying to load the related finding
records and their identifier
s for each feedback
record from the database in batch by using the batch loader.
We use these records to generate the UUID value for the feedback
records. It is possible that there is not a finding
entry for a given feedback
so we will just ignore that feedback
record and then remove them with a separate MR.
Query breakdown
The query to load feedback
records into the memory;
SELECT
"vulnerability_feedback".*
FROM
"vulnerability_feedback"
WHERE "vulnerability_feedback"."finding_uuid" IS NULL AND (id BETWEEN 1 AND 1000)
Index Scan using vulnerability_feedback_pkey on public.vulnerability_feedback (cost=0.29..118.71 rows=667 width=135) (actual time=0.040..0.516 rows=678 loops=1)
Index Cond: ((vulnerability_feedback.id >= 1) AND (vulnerability_feedback.id <= 1000))
Filter: (vulnerability_feedback.finding_uuid IS NULL)
Rows Removed by Filter: 0
Buffers: shared hit=211
The query to load finding
s;
SELECT
"vulnerability_occurrences".*
FROM
"vulnerability_occurrences"
WHERE
"vulnerability_occurrences"."project_id" IN (1)
AND "vulnerability_occurrences"."report_type" IN (1)
AND "vulnerability_occurrences"."project_fingerprint" IN ('something')
Bitmap Heap Scan on public.vulnerability_occurrences (cost=4.68..6.19 rows=1 width=1129) (actual time=6.798..6.800 rows=0 loops=1)
Buffers: shared hit=3 read=4
I/O Timings: read=6.697
-> BitmapAnd (cost=4.68..4.68 rows=1 width=0) (actual time=6.796..6.797 rows=0 loops=1)
Buffers: shared hit=3 read=4
I/O Timings: read=6.697
-> Bitmap Index Scan using index_vulnerability_occurrences_for_issue_links_migration (cost=0.00..2.10 rows=4 width=0) (actual time=6.794..6.795 rows=0 loops=1)
Index Cond: ((vulnerability_occurrences.project_id = 1) AND (vulnerability_occurrences.report_type = 1))
Buffers: shared hit=3 read=4
I/O Timings: read=6.697
-> Bitmap Index Scan using index_vulnerability_occurrences_on_project_fingerprint (cost=0.00..2.33 rows=36 width=0) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (vulnerability_occurrences.project_fingerprint = '\x736f6d657468696e67'::bytea)
The query to load identifier
s of finding
s;
SELECT
"vulnerability_identifiers".*
FROM
"vulnerability_identifiers"
WHERE
"vulnerability_identifiers"."id" IN (1, 2, 3, 4, 5)
Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers (cost=0.43..12.10 rows=5 width=151) (actual time=5.226..5.246 rows=5 loops=1)
Index Cond: (vulnerability_identifiers.id = ANY ('{1,2,3,4,5}'::bigint[]))
Buffers: shared hit=15 read=4
I/O Timings: read=5.136
And finally, we update each feedback
record with the UUID value by using the following query;
UPDATE
"vulnerability_feedback"
SET
"finding_uuid" = 'UUID VALUE'
WHERE
"vulnerability_feedback"."id" = 1
Timing forecast
Currently, we have 69_078 vulnerability_feedback
records in our production database with missing finding_uuid
values. So the post-deployment migration will schedule 70 background jobs and each of which will run consecutively with 2 minutes of delay. The whole process should take around 70 * 2 ~= 140 minutes to finish.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team