Match SBOM components to known advisories
Why are we doing this work
Components listed in a SBOM need to be matched to known advisories. This includes fetching all the advisories that match the package names and PURL types, and filtering out advisories such as the affected range excludes the version.
The output of the matching should be suitable to be used in two different contexts:
- Store security findings detected in SBOMs when ... (#395704 - closed)
- Scan newly ingested SBOM components of default ... (#371046 - closed)
Matching a version to an affected range is implemented in Add service to match advisory affected ranges t... (#371995 - closed).
Vulnerability Scanning vs License Scanning
This is somewhat similar to LicenseScanning::PackageLicenses
.
The contract is similar to the one of LicenseScanning::SbomScanner
.
- It takes an array of objects that respond to
purl_type
,name
, andversion
. - It returns a similar array with an extra
licenses
field (array).
Non-functional requirements
-
Documentation: -
Feature flag: No -
Performance: check performance of the SQL query that fetches vulnerability advisories for a given set of packages -
Testing: unit tests using rspec
Implementation plan
-
Add Gitlab::VulnerabilityScanning::PackageAdvisories
class.- Input: Array of objects that respond to
purl_type
,name
, andversion
.- Names include the namespace.
- Names are normalized.
- Fetch
PackageMetadata::AffectedPackage
models matching thepurl_type
andname
.- Preload the
advisory
field to prevent N+1 queries.
- Preload the
- Filter out advisories such as the affected range excludes the
version
.- Use class implemented in #371995 (closed).
- Output: Array of objects with
purl_type
,name
,version
, andadvisories
.
- Input: Array of objects that respond to
The above plan was implemented in Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7, however, we had to postpone that MR due to efficiency concerns.
The crux of the efficiency concern is that a consumer calling Gitlab::VulnerabilityScanning::PackageAdvisories#fetch will end up fetching all of the advisory data at once, with no way of iterating through this information, which could easily lead to a DB query timeout.
In order to solve this, we'll need to change Gitlab::VulnerabilityScanning::PackageAdvisories#fetch from the MR Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7 to use each_batch, similar to how this was implemented in Sbom::PossiblyAffectedOccurrencesFinder#execute_in_batches. This will allow consumers of Gitlab::VulnerabilityScanning::PackageAdvisories#fetch to iterate through the result set in batches, thereby reducing the possibility of a DB timeout.
So to the developer that picks up this issue - please start by re-opening Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7.
Verification steps
Verify that the performance of the query is acceptable when used in production. See Improve performance of package license query to... (#398679 - closed) and the documentation for example of optimizations that can further scope the query and make an efficient use of the IN
operator.