Deduplicate finding maps by UUID before ingestion
What does this MR do and why?
Deduplicate continuous vulnerability scanning finding maps by UUID before ingestion.
This works around the bug caused by the deduplication done by Gitlab::Ingestion::BulkInsertableTask
.
It's possible for us to run into situations where two finding maps contain the same
UUID in the same ingestion batch. Since the IngestFindings
class removes duplicates by uuid
before
upserting, this will cause the batch to receive the finding IDs only for the first duplicate finding.
To illustrate, take the following example. Say you have the following finding maps with the respective UUIDs.
[ uuid: 1] [uuid: 1] [uuid: 2] [uuid: 3]
After the unique filter, you'll have the following finding maps for upsert.
[uuid: 1] [uuid: 2] [uuid: 3]
As a result Postgres will return the newly insert IDs like so:
[finding_id: 1] [finding_id: 2] [finding_id: 3]
The IngestFindings
task assumes that there will be an ID for every finding map passed to the ActiveRecord call
and attempts to map the finding_id
attribute to the finding maps, so we end up with the following incorrect result.
[uuid: 1, finding_id: 1] [uuid: 1, finding_id: 2] [uuid: 2, finding_id: 3] [uuid: 3, finding_id: nil]
Since the IngestFindingPipelines
task expects a non-nil finding_id
, the entire batch fails and causes the issue
reported in #432870 (closed). Deduplicating the finding maps early on by uuid
ensures that we avoid this edge case.
Fixes #432870 (closed)
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Remove the
maps.uniq(&:uuid)
change and run the spec - It should fail the spec with the
Not null violation
error pictured in the screenshots. This is the same error from the reported bug.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.