Skip to content

Populate `finding_uuid` attribute for vulnerability_feedback

What does this MR do?

Related to #277134 (closed).

Database review

The post-deployment migration

There is a post-deployment migration to schedule the background jobs for all vulnerability_feedback records in a batch of 1k. It is using the EachBatch module which will iterate through all records by the primary key of the table.

rake db:migrate
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrating
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrated (0.0502s)
rake db:migrate:down

This is no-op

== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverting
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverted (0.0000s)

Background job

The background job is first loading all the feedback records into memory by given boundaries and then trying to load the related finding records and their identifiers for each feedback record from the database in batch by using the batch loader.

We use these records to generate the UUID value for the feedback records. It is possible that there is not a finding entry for a given feedback so we will just ignore that feedback record and then remove them with a separate MR.

Query breakdown

The query to load feedback records into the memory;

SELECT
    "vulnerability_feedback".*
FROM
    "vulnerability_feedback"
WHERE "vulnerability_feedback"."finding_uuid" IS NULL AND (id BETWEEN 1 AND 1000)
Index Scan using vulnerability_feedback_pkey on public.vulnerability_feedback  (cost=0.29..118.71 rows=667 width=135) (actual time=0.040..0.516 rows=678 loops=1)
   Index Cond: ((vulnerability_feedback.id >= 1) AND (vulnerability_feedback.id <= 1000))
   Filter: (vulnerability_feedback.finding_uuid IS NULL)
   Rows Removed by Filter: 0
   Buffers: shared hit=211

The query to load findings;

SELECT
    "vulnerability_occurrences".*
FROM
    "vulnerability_occurrences"
WHERE
    "vulnerability_occurrences"."project_id" IN (1)
    AND "vulnerability_occurrences"."report_type" IN (1)
    AND "vulnerability_occurrences"."project_fingerprint" IN ('something')
Bitmap Heap Scan on public.vulnerability_occurrences  (cost=4.68..6.19 rows=1 width=1129) (actual time=6.798..6.800 rows=0 loops=1)
   Buffers: shared hit=3 read=4
   I/O Timings: read=6.697
   ->  BitmapAnd  (cost=4.68..4.68 rows=1 width=0) (actual time=6.796..6.797 rows=0 loops=1)
         Buffers: shared hit=3 read=4
         I/O Timings: read=6.697
         ->  Bitmap Index Scan using index_vulnerability_occurrences_for_issue_links_migration  (cost=0.00..2.10 rows=4 width=0) (actual time=6.794..6.795 rows=0 loops=1)
               Index Cond: ((vulnerability_occurrences.project_id = 1) AND (vulnerability_occurrences.report_type = 1))
               Buffers: shared hit=3 read=4
               I/O Timings: read=6.697
         ->  Bitmap Index Scan using index_vulnerability_occurrences_on_project_fingerprint  (cost=0.00..2.33 rows=36 width=0) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (vulnerability_occurrences.project_fingerprint = '\x736f6d657468696e67'::bytea)

The query to load identifiers of findings;

SELECT
    "vulnerability_identifiers".*
FROM
    "vulnerability_identifiers"
WHERE
    "vulnerability_identifiers"."id" IN (1, 2, 3, 4, 5)
Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers  (cost=0.43..12.10 rows=5 width=151) (actual time=5.226..5.246 rows=5 loops=1)
   Index Cond: (vulnerability_identifiers.id = ANY ('{1,2,3,4,5}'::bigint[]))
   Buffers: shared hit=15 read=4
   I/O Timings: read=5.136

And finally, we update each feedback record with the UUID value by using the following query;

UPDATE
    "vulnerability_feedback"
SET
    "finding_uuid" = 'UUID VALUE'
WHERE
    "vulnerability_feedback"."id" = 1

Timing forecast

Currently, we have 69_078 vulnerability_feedback records in our production database with missing finding_uuid values. So the post-deployment migration will schedule 70 background jobs and each of which will run consecutively with 2 minutes of delay. The whole process should take around 70 * 2 ~= 140 minutes to finish.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mehmet Emin INAC

Merge request reports

Loading