Add batched migration to remove namespace from OS components
What does this MR do and why?
Add batched migration to remove namespace from OS components
When we first started ingesting OS components from Trivy SBOMs, we did not remove the namepaces from the component names. This meant that we did not match advisories because a component name in the advisory data was listed without the namespace prefix.
For example, we'd get the cURL component for an Alpine Linux image
as pkg:apk/alpine/curl@<version>
, which we'd ultimately parse as
alpine/curl
. In this example, the advisories for cURL would have a
name like curl
instead of alpine/curl
. Thus, we'd never get a match
on the advisories and components when performing advisory scans.
The PURL specification dictates that alpine/curl
is the correct name,
but in our case we choose to deviate from this because store the
operating system metadata as an SBOM property instead. All this
considered, the most pragmatic solution here, and thus the one taken, is
to migrate all of our components to drop the namespace/os prefixes
instead of re-exporting all of our advisories from the GitLab Package
Metadata Database (a much larger change).
Relates to Remove distro prefix from OS component names (#442847 - closed) • Oscar Tovar • 17.2 • At risk
Stats
-- Records we're expecting to migrate
SELECT
COUNT("sbom_components".id) records_to_migrate_cnt
FROM
sbom_components
WHERE
purl_type > 8
AND purl_type < 14
AND component_type = 0
AND name LIKE '%/%';
records_to_migrate_cnt
------------------------
22376
-- Records we're expecting to raise a not unique error
WITH components_to_migrate AS (
SELECT
id,
"sbom_components".name AS full_name,
SPLIT_PART("sbom_components".name, '/', 1) AS namespace,
SPLIT_PART("sbom_components".name, '/', 2) AS name
FROM
sbom_components
WHERE
purl_type > 8
AND purl_type < 14
AND component_type = 0
AND name LIKE '%/%'
)
SELECT
COUNT(DISTINCT "sbom_components".id) AS approx_expected_record_uniq_errors
FROM
sbom_components
INNER JOIN components_to_migrate ON "sbom_components".name = "components_to_migrate".name
WHERE
"sbom_components".purl_type > 8
AND "sbom_components".purl_type < 14
AND "sbom_components".component_type = 0;
approx_expected_record_uniq_errors
------------------------------------
10920
(1 row)
-- Prefixes that we'll be removing. This serves as proof that we won't corrupt data,
-- and are only removing the os prefixes.
SELECT DISTINCT
SPLIT_PART("sbom_components".name, '/', 1) AS os_prefix
FROM
sbom_components
WHERE
purl_type > 8
AND purl_type < 14
AND component_type = 0
AND name LIKE '%/%'
ORDER BY
os_prefix;
os_prefix
---------------------
alma
alpine
amazon
centos
debian
fedora
opensuse.leap
opensuse.tumbleweed
oracle
photon
redhat
rocky
sles
ubuntu
(14 rows)
-- This serves as proof that os components don't support nested namespaces.
-- We don't have a component that has the `<os>/<nested-namespace>/<name>` format.
-- I checked this to make sure that this wasn't a possibility, and something we'd miss.
SELECT
COUNT(id)
FROM
sbom_components
WHERE
purl_type > 8
AND purl_type < 14
AND component_type = 0
AND name LIKE '%/%/%';
count
-------
0
(1 row)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
Production Data
Tested in postgres.ai
Index Scan using index_sbom_components_on_component_type_name_and_purl_type on public.sbom_components (cost=0.42..53319.75 rows=38665 width=83) (actual time=2.787..82.102 rows=22376 loops=1)
Index Cond: (sbom_components.purl_type = ANY ('{9,10,11,12,13}'::integer[]))
Filter: (sbom_components.name ~~ '%/%'::text)
Rows Removed by Filter: 12971
Buffers: shared hit=41342
I/O Timings: read=0.000 write=0.000
An approximate total of 22376 rows match this condition.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.