Draft: Add service to match SBOM components and vulns
What does this MR do and why?
This MR adds a service to match SBOM
components and advisories. It exposes a single public method Gitlab::VulnerabilityScanning::PackageAdvisories#fetch which takes as input an array of SBOM component
which are Hashie::Mash
objects containing the following fields:
purl_type
version
name
The fetch
method searches through the pm_affected_packages
table in batches of 700
records for entries that match the given purl_type
and name
, and joins against the pm_advisories
table. It retrieves the following fields from both of these tables:
- pm_advisories.title,
- pm_advisories.description,
- pm_advisories.cvss_v2,
- pm_advisories.cvss_v3,
- pm_advisories.urls,
- pm_advisories.identifiers,
- pm_affected_packages.purl_type,
- pm_affected_packages.package_name,
- pm_affected_packages.solution,
- pm_affected_packages.affected_range
Benchmarks
As discussed in this thread, we need a way to keep track of the component version
values. Two options were discussed:
- using a CTE (implemented in this MR)
- using an in-memory hash
The benchmarks below include results for both the CTE
method and the in-memory hash
for the sake of comparison, however, the numbers show that a CTE
is a more efficient approach, which has been implemented in this MR.
To understand how these benchmarks compare to existing code, please see the MR for updating package metadata license lookup to use compressed data.
You'll notice that the benchmarks for updating package metadata license lookup to use compressed data
are much faster, at 0.550493 s
for fetching 5000 compressed records
versus 13.587334 s
in this MR for fetching 5000 components where 4000 of them have affected packages (a very unlikely situation). However, the code for updating package metadata license lookup to use compressed data
is user-facing, so we've tried to make it as fast as possible, whereas the code in this MR is not user-facing, since it'll be kicked off by a sidekiq job in the background, so slower response times are more acceptable.
-
1000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 1000 affected packages =================================================================== user system total real 100 records 0.138171 0.005587 0.143758 ( 0.161953) 200 records 0.183474 0.005838 0.189312 ( 0.222943) 300 records 0.298285 0.008604 0.306889 ( 0.348068) 400 records 0.382594 0.009639 0.392233 ( 0.441390) 500 records 0.486761 0.015195 0.501956 ( 0.559242) 600 records 0.631321 0.019927 0.651248 ( 0.722986) 700 records 0.655312 0.016848 0.672160 ( 0.752760) 800 records 0.835680 0.018616 0.854296 ( 0.950211) 900 records 0.922340 0.025124 0.947464 ( 1.059607) 1000 records 0.829426 0.220630 1.050056 ( 1.171212) 2000 records 1.614828 0.048416 1.663244 ( 1.892707) 3000 records 2.548333 0.075535 2.623868 ( 2.986696) 4000 records 3.420843 0.099478 3.520321 ( 3.990201) 5000 records 4.427780 0.122395 4.550175 ( 5.139824) =================================================================== Benchmark results with in-memory hash version lookup with components containing 1000 affected packages =================================================================== user system total real 100 records 0.204993 0.006792 0.211785 ( 0.229392) 200 records 0.205858 0.008854 0.214712 ( 0.238709) 300 records 0.298505 0.008785 0.307290 ( 0.339396) 400 records 0.362667 0.004269 0.366936 ( 0.405912) 500 records 0.435338 0.009480 0.444818 ( 0.489894) 600 records 0.554225 0.014166 0.568391 ( 0.620221) 700 records 0.615241 0.015856 0.631097 ( 0.690592) 800 records 1.102302 0.024933 1.127235 ( 1.198590) 900 records 1.311307 0.026613 1.337920 ( 1.424326) 1000 records 1.724592 0.042284 1.766876 ( 1.862162) 2000 records 4.452901 0.099234 4.552135 ( 4.733542) 3000 records 10.395869 0.221649 10.617518 ( 10.885542) 4000 records 18.042436 0.660988 18.703424 ( 19.063492) 5000 records 28.101127 1.092482 29.193609 ( 29.654618) ===================================================================
-
2000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 2000 affected packages =================================================================== user system total real 100 records 0.229686 0.017773 0.247459 ( 0.271600) 200 records 0.401965 0.014219 0.416184 ( 0.457788) 300 records 0.558023 0.017265 0.575288 ( 0.625430) 400 records 0.661989 0.022778 0.684767 ( 0.748146) 500 records 0.786053 0.188283 0.974336 ( 1.049490) 600 records 0.895979 0.027676 0.923655 ( 1.022674) 700 records 1.156021 0.036124 1.192145 ( 1.309637) 800 records 1.284261 0.041253 1.325514 ( 1.462144) 900 records 1.382215 0.042595 1.424810 ( 1.576426) 1000 records 1.467692 0.044432 1.512124 ( 1.672537) 2000 records 2.827644 0.084353 2.911997 ( 3.230935) 3000 records 4.428484 0.441466 4.869950 ( 5.359017) 4000 records 5.785917 0.167348 5.953265 ( 6.587623) 5000 records 7.490965 0.223624 7.714589 ( 8.532240) =================================================================== Benchmark results with in-memory hash version lookup with components containing 2000 affected packages =================================================================== user system total real 100 records 0.320153 0.010363 0.330516 ( 0.345818) 200 records 0.428199 0.016496 0.444695 ( 0.470798) 300 records 0.497978 0.013736 0.511714 ( 0.543679) 400 records 0.656134 0.018508 0.674642 ( 0.713427) 500 records 0.810513 0.020124 0.830637 ( 0.875123) 600 records 0.961890 0.022576 0.984466 ( 1.035953) 700 records 1.168881 0.026746 1.195627 ( 1.254277) 800 records 1.671087 0.039767 1.710854 ( 1.783577) 900 records 2.324100 0.052157 2.376257 ( 2.459570) 1000 records 2.578770 0.064218 2.642988 ( 2.732034) 2000 records 8.064514 0.176205 8.240719 ( 8.411064) 3000 records 19.925639 0.434941 20.360580 ( 20.622384) 4000 records 33.924963 1.198814 35.123777 ( 35.468715) 5000 records 56.534889 2.194605 58.729494 ( 59.169322) =================================================================== </details>
-
3000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 3000 affected packages =================================================================== user system total real 100 records 0.370768 0.013872 0.384640 ( 0.413206) 200 records 0.559093 0.014545 0.573638 ( 0.619264) 300 records 0.640525 0.139878 0.780403 ( 0.836791) 400 records 0.777679 0.025087 0.802766 ( 0.887398) 500 records 0.932292 0.027473 0.959765 ( 1.061200) 600 records 1.033040 0.030242 1.063282 ( 1.180928) 700 records 1.246824 0.038131 1.284955 ( 1.424180) 800 records 1.348300 0.034815 1.383115 ( 1.536389) 900 records 1.694583 0.050465 1.745048 ( 1.920516) 1000 records 2.022909 0.056121 2.079030 ( 2.267054) 2000 records 3.602560 0.367170 3.969730 ( 4.371499) 3000 records 5.812396 0.177475 5.989871 ( 6.601697) 4000 records 7.624252 0.850542 8.474794 ( 9.290588) 5000 records 9.070385 0.298463 9.368848 ( 10.401297) =================================================================== Benchmark results with in-memory hash version lookup with components containing 3000 affected packages =================================================================== user system total real 100 records 0.328865 0.010059 0.338924 ( 0.355052) 200 records 0.512301 0.012661 0.524962 ( 0.551092) 300 records 0.704912 0.016913 0.721825 ( 0.754790) 400 records 1.099221 0.033035 1.132256 ( 1.172481) 500 records 1.164526 0.036386 1.200912 ( 1.257501) 600 records 1.342235 0.035431 1.377666 ( 1.431948) 700 records 1.572490 0.034914 1.607404 ( 1.667010) 800 records 1.945943 0.046310 1.992253 ( 2.062189) 900 records 3.525004 0.102298 3.627302 ( 3.711187) 1000 records 3.966712 0.093647 4.060359 ( 4.150162) 2000 records 12.804322 0.291073 13.095395 ( 13.263047) 3000 records 33.066726 0.716707 33.783433 ( 34.065425) 4000 records 56.166261 2.899048 59.065309 ( 59.409173) 5000 records 89.607063 4.136758 93.743821 ( 94.247169) =================================================================== </details>
-
4000 affected components out of 5000 components
Click to expand
Benchmark results with CTE version lookup with components containing 4000 affected packages =================================================================== user system total real 100 records 0.509272 0.016921 0.526193 ( 0.557925) 200 records 0.721374 0.161824 0.883198 ( 0.938143) 300 records 1.060313 0.033536 1.093849 ( 1.194125) 400 records 1.321734 0.035485 1.357219 ( 1.480358) 500 records 1.389149 0.038384 1.427533 ( 1.568226) 600 records 1.582359 0.047563 1.629922 ( 1.798454) 700 records 1.804642 0.045215 1.849857 ( 2.052383) 800 records 2.293646 0.066506 2.360152 ( 2.581256) 900 records 2.277810 0.330342 2.608152 ( 2.853636) 1000 records 2.595494 0.078682 2.674176 ( 2.930324) 2000 records 4.777275 0.133482 4.910757 ( 5.438388) 3000 records 7.124048 0.638050 7.762098 ( 8.528608) 4000 records 8.963478 0.274313 9.237791 ( 10.268583) 5000 records 11.238390 1.122125 12.360515 ( 13.587334) =================================================================== Benchmark results with in-memory hash version lookup with components containing 4000 affected packages =================================================================== user system total real 100 records 0.447054 0.013611 0.460665 ( 0.477721) 200 records 0.774120 0.021416 0.795536 ( 0.823644) 300 records 1.466924 0.030320 1.497244 ( 1.529864) 400 records 1.811677 0.041309 1.852986 ( 1.904842) 500 records 2.006759 0.047037 2.053796 ( 2.098774) 600 records 2.462239 0.053800 2.516039 ( 2.567483) 700 records 2.931670 0.101191 3.032861 ( 3.090854) 800 records 5.577336 0.141814 5.719150 ( 5.808348) 900 records 6.541524 0.164061 6.705585 ( 6.797809) 1000 records 6.872795 0.165147 7.037942 ( 7.127721) 2000 records 21.035475 0.488726 21.524201 ( 21.690226) 3000 records 50.627194 1.559264 52.186458 ( 52.466575) 4000 records 78.586900 3.386539 81.973439 ( 82.373508) 5000 records 122.841316 4.818588 127.659904 (128.212117) =================================================================== </details>
Data characteristics
Raw SQL
https://paste.depesz.com/s/w9u
WITH package_versions_cte ( purl_type, name, version ) AS (
VALUES
( 1, 'moodle/moodle', '3.5.17' ),
( 6, 'url-parse', '1.5.1' ),
...
( 5, 'org.apache.ant/ant', '1.9.0' ),
)
SELECT DISTINCT
package_versions_cte.version,
published_date,
title,
description,
cvss_v2,
cvss_v3,
urls,
identifiers,
pm_affected_packages.purl_type,
pm_affected_packages.package_name,
pm_affected_packages.solution,
pm_affected_packages.affected_range
FROM
pm_affected_packages
JOIN pm_advisories ON pm_advisories.id = pm_affected_packages.pm_advisory_id
JOIN package_versions_cte ON package_versions_cte.purl_type = pm_affected_packages.purl_type AND
package_versions_cte.name = pm_affected_packages.package_name
WHERE
(
pm_affected_packages.purl_type,
pm_affected_packages.package_name
) IN (
( 1, 'moodle/moodle' ),
( 6, 'url-parse' ),
...
( 5, 'org.apache.ant/ant' ),
);
Query plan
699
random records where every record has an affected package
- https://explain.depesz.com/s/QZNj
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/21081/commands/68882
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #371055 (closed)