Skip to content

Backfill vulnerability_count background migration

Yuval Siev requested to merge issue-474280-vuln-count-background-migration into master

What does this MR do and why?

This MR adds a batched background migration to backfill the vulnerability_count column inside the project_statistics table.
We are running this migration on the project_settings table, to enable better filtering using the has_vulnerabilities column.
This is the third checkbox of this issue.
Must be merged only after this MR is merged.

The process of the migration:
We scope only projects that has vulnerabilities.

SELECT project_id
FROM project_settings
WHERE has_vulnerabilities = true

Then, we count the number of vulnerabilities those projects have:

SELECT count(*)
FROM vulnerabilities
WHERE project_id = <project_id>

And finally, we set this count to the project_statistics.vulnerability_count column

UPDATE project_statistics
SET vulnerability_count = <vulnerability_count>
WHERE project_id = project_id

Database review

Here are the queries used by the batched background migration;

Finding the lower bound of the sub-batch
SELECT
    "project_settings"."project_id"
FROM
    "project_settings"
WHERE
    "project_settings"."project_id" BETWEEN 1 AND 213128123
    AND (has_vulnerabilities IS TRUE)
ORDER BY
    "project_settings"."project_id" ASC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96949

Finding the upper bound of the sub-batch
SELECT
    "project_settings"."project_id"
FROM
    "project_settings"
WHERE
    "project_settings"."project_id" BETWEEN 1 AND 213128123
    AND (has_vulnerabilities IS TRUE)
    AND "project_settings"."project_id" >= 21
ORDER BY
    "project_settings"."project_id" ASC
LIMIT 1 OFFSET 100

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96950

Loading the project statistics record
SELECT
    "project_statistics".*
FROM
    "project_statistics"
WHERE
    "project_statistics"."project_id" = 278964
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96951

Finding the latest vulnerability record for the project
SELECT
    vulnerability_reads.*
FROM
    vulnerability_reads
WHERE
    vulnerability_reads.project_id = 278964
ORDER BY
    vulnerability_reads.vulnerability_id DESC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96952

Finding the lower bound for the vulnerability batching
SELECT
    "vulnerability_reads"."vulnerability_id"
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
ORDER BY
    "vulnerability_reads"."vulnerability_id" ASC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96954

Finding the upper bound of a batch iteration for the vulnerability batching
SELECT
    "vulnerability_reads"."vulnerability_id"
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
ORDER BY
    "vulnerability_reads"."vulnerability_id" ASC
LIMIT 1 OFFSET 1000

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96955

Counting the number of vulnerabilities in batch
SELECT
    COUNT(*)
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
AND "vulnerability_reads"."vulnerability_id" < 12449

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96956

Updating the project statistics
UPDATE
    project_statistics
SET
    vulnerability_count = vulnerability_count + 1000
WHERE
    id = 278964

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96957

There are currently 275_850 project_settings records in the database with has_vulnerabilities IS TRUE. With the batch size of 1_000 records, we will run 276 jobs. Each job takes 2 minutes to complete so the total runtime will be around 9 hours.

Edited by Mehmet Emin INAC

Merge request reports

Loading