Populate `finding_uuid` attribute for vulnerability_feedback (!49807) · Merge requests · GitLab.org / GitLab

Mehmet Emin INAC requested to merge 277134_populate_finding_uuid_values_for_vulnerability_feedback into master Dec 11, 2020

What does this MR do?

Related to #277134 (closed).

Database review

The post-deployment migration

There is a post-deployment migration to schedule the background jobs for all vulnerability_feedback records in a batch of 1k. It is using the EachBatch module which will iterate through all records by the primary key of the table.

rake db:migrate

== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrating
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: migrated (0.0502s)

rake db:migrate:down

This is no-op

== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverting
== 20201211090634 SchedulePopulateFindingUuidForVulnerabilityFeedback: reverted (0.0000s)

Background job

The background job is first loading all the feedback records into memory by given boundaries and then trying to load the related finding records and their identifiers for each feedback record from the database in batch by using the batch loader.

We use these records to generate the UUID value for the feedback records. It is possible that there is not a finding entry for a given feedback so we will just ignore that feedback record and then remove them with a separate MR.

Query breakdown

The query to load feedback records into the memory;

SELECT
    "vulnerability_feedback".*
FROM
    "vulnerability_feedback"
WHERE "vulnerability_feedback"."finding_uuid" IS NULL AND (id BETWEEN 1 AND 1000)

Index Scan using vulnerability_feedback_pkey on public.vulnerability_feedback  (cost=0.29..118.71 rows=667 width=135) (actual time=0.040..0.516 rows=678 loops=1)
   Index Cond: ((vulnerability_feedback.id >= 1) AND (vulnerability_feedback.id <= 1000))
   Filter: (vulnerability_feedback.finding_uuid IS NULL)
   Rows Removed by Filter: 0
   Buffers: shared hit=211

The query to load findings;

SELECT
    "vulnerability_occurrences".*
FROM
    "vulnerability_occurrences"
WHERE
    "vulnerability_occurrences"."project_id" IN (1)
    AND "vulnerability_occurrences"."report_type" IN (1)
    AND "vulnerability_occurrences"."project_fingerprint" IN ('something')

Bitmap Heap Scan on public.vulnerability_occurrences  (cost=4.68..6.19 rows=1 width=1129) (actual time=6.798..6.800 rows=0 loops=1)
   Buffers: shared hit=3 read=4
   I/O Timings: read=6.697
   ->  BitmapAnd  (cost=4.68..4.68 rows=1 width=0) (actual time=6.796..6.797 rows=0 loops=1)
         Buffers: shared hit=3 read=4
         I/O Timings: read=6.697
         ->  Bitmap Index Scan using index_vulnerability_occurrences_for_issue_links_migration  (cost=0.00..2.10 rows=4 width=0) (actual time=6.794..6.795 rows=0 loops=1)
               Index Cond: ((vulnerability_occurrences.project_id = 1) AND (vulnerability_occurrences.report_type = 1))
               Buffers: shared hit=3 read=4
               I/O Timings: read=6.697
         ->  Bitmap Index Scan using index_vulnerability_occurrences_on_project_fingerprint  (cost=0.00..2.33 rows=36 width=0) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (vulnerability_occurrences.project_fingerprint = '\x736f6d657468696e67'::bytea)

The query to load identifiers of findings;

SELECT
    "vulnerability_identifiers".*
FROM
    "vulnerability_identifiers"
WHERE
    "vulnerability_identifiers"."id" IN (1, 2, 3, 4, 5)

Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers  (cost=0.43..12.10 rows=5 width=151) (actual time=5.226..5.246 rows=5 loops=1)
   Index Cond: (vulnerability_identifiers.id = ANY ('{1,2,3,4,5}'::bigint[]))
   Buffers: shared hit=15 read=4
   I/O Timings: read=5.136

And finally, we update each feedback record with the UUID value by using the following query;

UPDATE
    "vulnerability_feedback"
SET
    "finding_uuid" = 'UUID VALUE'
WHERE
    "vulnerability_feedback"."id" = 1

Timing forecast

Currently, we have 69_078 vulnerability_feedback records in our production database with missing finding_uuid values. So the post-deployment migration will schedule 70 background jobs and each of which will run consecutively with 2 minutes of delay. The whole process should take around 70 * 2 ~= 140 minutes to finish.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

[-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
[-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Edited Jan 05, 2021 by Mehmet Emin INAC

Populate `finding_uuid` attribute for vulnerability_feedback