Drop Vulnerabilites that would be invalid as well
What does this MR do and why?
Modify Gitlab::BackgroundMigration::RemoveDuplicateVulnerabilitiesFindings
to drop Vulnerability
objects that would be invalid since the corresponding Vulnerability::Finding
object was dropped.
How to set up and validate locally
bundle exec spring rspec spec/lib/gitlab/background_migration/remove_duplicate_vulnerabilities_findings_spec.rb
Database review
This MR is NOT going to schedule this background migration, the scheduling is going to happen in a different MR.
I also updated the specs to reflect actual values from the database so that we're 100% sure this is going to have the intended effect.
How many records are affected?
We have ~26463 pairs of duplicates so ~26463 records will be dropped. Any of those Vulnerability::Finding
objects can be assigned to a Vulnerability
so in the worst case scenario we will be:
- dropping 26463
Vulnerability::Finding
records - dropping 26463
Vulnerability
records
Based on:
SELECT DISTINCT report_type, location_fingerprint, primary_identifier_id, project_id, array_agg(id) as ids, array_agg(uuid) as uuids
FROM vulnerability_occurrences
GROUP BY report_type, location_fingerprint, primary_identifier_id, project_id
HAVING (COUNT(*) > 1) AND (array_length(array_agg(vulnerability_id) FILTER (WHERE vulnerability_id IS NOT NULL), 1) = 1);
there are 199 duplicate pairs where only one Vulnerabilities::Finding
has a Vulnerability
associated so the count of records dropped would be 26463 and 26264
Selecting a batch of 1000 Vulnerability::Findings
EXPLAIN ANALYZE WITH "batch" AS MATERIALIZED (
SELECT "vulnerability_occurrences"."report_type", "vulnerability_occurrences"."location_fingerprint", "vulnerability_occurrences"."primary_identifier_id", "vulnerability_occurrences"."project_id"
FROM "vulnerability_occurrences"
WHERE "vulnerability_occurrences"."id" BETWEEN 1 AND 1260
)
SELECT DISTINCT "batch"."report_type", "batch"."location_fingerprint", "batch"."primary_identifier_id", "batch"."project_id", array_agg(id) as ids
FROM "batch"
AS batch
INNER JOIN vulnerability_occurrences ON vulnerability_occurrences.report_type = batch.report_type AND vulnerability_occurrences.location_fingerprint = batch.location_fingerprint AND vulnerability_occurrences.primary_identifier_id = batch.primary_identifier_id AND vulnerability_occurrences.project_id = batch.project_id
GROUP BY "batch"."report_type", "batch"."location_fingerprint", "batch"."primary_identifier_id", "batch"."project_id"
HAVING (COUNT(*) > 1);
cold cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23979
warm cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23971
Selecting associated Vulnerabilities
My understanding is that we would usually have two duplicate Vulnerabilities::Finding
and both of those Vulnerabilities::Finding
can have an associated Vulnerability:
SELECT "vulnerability_occurrences"."vulnerability_id" FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."id" IN (5606961, 8765432);
This query is relatively simple so I think we can assume it's not going to timeout.
Dropping a batch of Vulnerability::Findings objects
DELETE FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."id" IN (5606961, 8765432);
cold cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23975
warm cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23976
Dropping a batch of associated Vulnerability objects
DELETE FROM "vulnerabilities" WHERE "vulnerabilities"."id" IN (51, 52);
cold cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23977
warm cache: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6785/commands/23978
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #341917