Cleanup old vulnerabilities data stored in DB
Problem to solve
The way we store vulnerabilities in DB implies some data could become obsolete over time and we must clean things up to avoid filling DB indefinitely.
So we need to define the retention period (like artifacts expiration date) and trigger a clean up job to delete old records.
This has to be carefully design by keeping in mind that we want historical metrics for vulnerabilities.
Further details
The usual lifecycle itself requires a cleanup policy but there are also some specific cases that could lead to stale data we want to get rid of:
- if a report type has been removed from the config, there won't be such report anymore so existing records for that category never get cleaned up.
- if the default branch changes, vulnerabilities stored on the previous one will stay forever
- once we decide to support other branches, old refs will also stay forever even if the branch is removed/not active anymore.
Proposal
Run a periodic background job to clean up old data. The cleanup job should delete:
-
vulnerability_occurrences
records that only belong to pipelines older than the retention period (join withvulnerability_occurrence_pipelines
join model). This will also automatically delete the matchingvulnerability_occurrence_pipelines
andvulnerability_occurrence_identifiers
records due to FKson_delete: :cascade
option. -
vulnerability_identifiers
records that aren't used anymore (without matchingvulnerability_occurrence_identifiers
join model records) -
vulnerability_scanners
records that aren't used anymore (novulnerability_occurrences
record with matchingscanner_id
)
This is a first start that covers all cases after a given amount of time. We can improve it to react immediately to some specific edges cases if necessary (branch removed, report disabled, etc.).
What does success look like, and how can we measure that?
(If no way to measure success, link to an issue that will implement a way to measure this)