Skip to content

Draft: Add service to match SBOM components and vulns

What does this MR do and why?

This MR adds a service to match SBOM components and advisories. It exposes a single public method Gitlab::VulnerabilityScanning::PackageAdvisories#fetch which takes as input an array of SBOM component which are Hashie::Mash objects containing the following fields:

  • purl_type
  • version
  • name

The fetch method searches through the pm_affected_packages table in batches of 700 records for entries that match the given purl_type and name, and joins against the pm_advisories table. It retrieves the following fields from both of these tables:

  • pm_advisories.title,
  • pm_advisories.description,
  • pm_advisories.cvss_v2,
  • pm_advisories.cvss_v3,
  • pm_advisories.urls,
  • pm_advisories.identifiers,
  • pm_affected_packages.purl_type,
  • pm_affected_packages.package_name,
  • pm_affected_packages.solution,
  • pm_affected_packages.affected_range

Benchmarks

As discussed in this thread, we need a way to keep track of the component version values. Two options were discussed:

  • using a CTE (implemented in this MR)
  • using an in-memory hash

The benchmarks below include results for both the CTE method and the in-memory hash for the sake of comparison, however, the numbers show that a CTE is a more efficient approach, which has been implemented in this MR.

To understand how these benchmarks compare to existing code, please see the MR for updating package metadata license lookup to use compressed data.

You'll notice that the benchmarks for updating package metadata license lookup to use compressed data are much faster, at 0.550493 s for fetching 5000 compressed records versus 13.587334 s in this MR for fetching 5000 components where 4000 of them have affected packages (a very unlikely situation). However, the code for updating package metadata license lookup to use compressed data is user-facing, so we've tried to make it as fast as possible, whereas the code in this MR is not user-facing, since it'll be kicked off by a sidekiq job in the background, so slower response times are more acceptable.

  • 1000 affected components out of 5000 components

    Click to expand
    Benchmark results with CTE version lookup with components containing 1000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.138171   0.005587   0.143758 (  0.161953)
      200 records          0.183474   0.005838   0.189312 (  0.222943)
      300 records          0.298285   0.008604   0.306889 (  0.348068)
      400 records          0.382594   0.009639   0.392233 (  0.441390)
      500 records          0.486761   0.015195   0.501956 (  0.559242)
      600 records          0.631321   0.019927   0.651248 (  0.722986)
      700 records          0.655312   0.016848   0.672160 (  0.752760)
      800 records          0.835680   0.018616   0.854296 (  0.950211)
      900 records          0.922340   0.025124   0.947464 (  1.059607)
     1000 records          0.829426   0.220630   1.050056 (  1.171212)
     2000 records          1.614828   0.048416   1.663244 (  1.892707)
     3000 records          2.548333   0.075535   2.623868 (  2.986696)
     4000 records          3.420843   0.099478   3.520321 (  3.990201)
     5000 records          4.427780   0.122395   4.550175 (  5.139824)
    ===================================================================
    
    
    Benchmark results with in-memory hash version lookup with components containing 1000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.204993   0.006792   0.211785 (  0.229392)
      200 records          0.205858   0.008854   0.214712 (  0.238709)
      300 records          0.298505   0.008785   0.307290 (  0.339396)
      400 records          0.362667   0.004269   0.366936 (  0.405912)
      500 records          0.435338   0.009480   0.444818 (  0.489894)
      600 records          0.554225   0.014166   0.568391 (  0.620221)
      700 records          0.615241   0.015856   0.631097 (  0.690592)
      800 records          1.102302   0.024933   1.127235 (  1.198590)
      900 records          1.311307   0.026613   1.337920 (  1.424326)
     1000 records          1.724592   0.042284   1.766876 (  1.862162)
     2000 records          4.452901   0.099234   4.552135 (  4.733542)
     3000 records         10.395869   0.221649  10.617518 ( 10.885542)
     4000 records         18.042436   0.660988  18.703424 ( 19.063492)
     5000 records         28.101127   1.092482  29.193609 ( 29.654618)
    ===================================================================
  • 2000 affected components out of 5000 components

    Click to expand
    Benchmark results with CTE version lookup with components containing 2000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.229686   0.017773   0.247459 (  0.271600)
      200 records          0.401965   0.014219   0.416184 (  0.457788)
      300 records          0.558023   0.017265   0.575288 (  0.625430)
      400 records          0.661989   0.022778   0.684767 (  0.748146)
      500 records          0.786053   0.188283   0.974336 (  1.049490)
      600 records          0.895979   0.027676   0.923655 (  1.022674)
      700 records          1.156021   0.036124   1.192145 (  1.309637)
      800 records          1.284261   0.041253   1.325514 (  1.462144)
      900 records          1.382215   0.042595   1.424810 (  1.576426)
     1000 records          1.467692   0.044432   1.512124 (  1.672537)
     2000 records          2.827644   0.084353   2.911997 (  3.230935)
     3000 records          4.428484   0.441466   4.869950 (  5.359017)
     4000 records          5.785917   0.167348   5.953265 (  6.587623)
     5000 records          7.490965   0.223624   7.714589 (  8.532240)
    ===================================================================
    
    
    Benchmark results with in-memory hash version lookup with components containing 2000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.320153   0.010363   0.330516 (  0.345818)
      200 records          0.428199   0.016496   0.444695 (  0.470798)
      300 records          0.497978   0.013736   0.511714 (  0.543679)
      400 records          0.656134   0.018508   0.674642 (  0.713427)
      500 records          0.810513   0.020124   0.830637 (  0.875123)
      600 records          0.961890   0.022576   0.984466 (  1.035953)
      700 records          1.168881   0.026746   1.195627 (  1.254277)
      800 records          1.671087   0.039767   1.710854 (  1.783577)
      900 records          2.324100   0.052157   2.376257 (  2.459570)
     1000 records          2.578770   0.064218   2.642988 (  2.732034)
     2000 records          8.064514   0.176205   8.240719 (  8.411064)
     3000 records         19.925639   0.434941  20.360580 ( 20.622384)
     4000 records         33.924963   1.198814  35.123777 ( 35.468715)
     5000 records         56.534889   2.194605  58.729494 ( 59.169322)
    ===================================================================
    
    </details>
    
  • 3000 affected components out of 5000 components

    Click to expand
    Benchmark results with CTE version lookup with components containing 3000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.370768   0.013872   0.384640 (  0.413206)
      200 records          0.559093   0.014545   0.573638 (  0.619264)
      300 records          0.640525   0.139878   0.780403 (  0.836791)
      400 records          0.777679   0.025087   0.802766 (  0.887398)
      500 records          0.932292   0.027473   0.959765 (  1.061200)
      600 records          1.033040   0.030242   1.063282 (  1.180928)
      700 records          1.246824   0.038131   1.284955 (  1.424180)
      800 records          1.348300   0.034815   1.383115 (  1.536389)
      900 records          1.694583   0.050465   1.745048 (  1.920516)
     1000 records          2.022909   0.056121   2.079030 (  2.267054)
     2000 records          3.602560   0.367170   3.969730 (  4.371499)
     3000 records          5.812396   0.177475   5.989871 (  6.601697)
     4000 records          7.624252   0.850542   8.474794 (  9.290588)
     5000 records          9.070385   0.298463   9.368848 ( 10.401297)
    ===================================================================
    
    
    Benchmark results with in-memory hash version lookup with components containing 3000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.328865   0.010059   0.338924 (  0.355052)
      200 records          0.512301   0.012661   0.524962 (  0.551092)
      300 records          0.704912   0.016913   0.721825 (  0.754790)
      400 records          1.099221   0.033035   1.132256 (  1.172481)
      500 records          1.164526   0.036386   1.200912 (  1.257501)
      600 records          1.342235   0.035431   1.377666 (  1.431948)
      700 records          1.572490   0.034914   1.607404 (  1.667010)
      800 records          1.945943   0.046310   1.992253 (  2.062189)
      900 records          3.525004   0.102298   3.627302 (  3.711187)
     1000 records          3.966712   0.093647   4.060359 (  4.150162)
     2000 records         12.804322   0.291073  13.095395 ( 13.263047)
     3000 records         33.066726   0.716707  33.783433 ( 34.065425)
     4000 records         56.166261   2.899048  59.065309 ( 59.409173)
     5000 records         89.607063   4.136758  93.743821 ( 94.247169)
    ===================================================================
    
    </details>
    
  • 4000 affected components out of 5000 components

    Click to expand
    Benchmark results with CTE version lookup with components containing 4000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.509272   0.016921   0.526193 (  0.557925)
      200 records          0.721374   0.161824   0.883198 (  0.938143)
      300 records          1.060313   0.033536   1.093849 (  1.194125)
      400 records          1.321734   0.035485   1.357219 (  1.480358)
      500 records          1.389149   0.038384   1.427533 (  1.568226)
      600 records          1.582359   0.047563   1.629922 (  1.798454)
      700 records          1.804642   0.045215   1.849857 (  2.052383)
      800 records          2.293646   0.066506   2.360152 (  2.581256)
      900 records          2.277810   0.330342   2.608152 (  2.853636)
     1000 records          2.595494   0.078682   2.674176 (  2.930324)
     2000 records          4.777275   0.133482   4.910757 (  5.438388)
     3000 records          7.124048   0.638050   7.762098 (  8.528608)
     4000 records          8.963478   0.274313   9.237791 ( 10.268583)
     5000 records         11.238390   1.122125  12.360515 ( 13.587334)
    ===================================================================
    
    
    Benchmark results with in-memory hash version lookup with components containing 4000 affected packages
    ===================================================================
                               user     system      total        real
      100 records          0.447054   0.013611   0.460665 (  0.477721)
      200 records          0.774120   0.021416   0.795536 (  0.823644)
      300 records          1.466924   0.030320   1.497244 (  1.529864)
      400 records          1.811677   0.041309   1.852986 (  1.904842)
      500 records          2.006759   0.047037   2.053796 (  2.098774)
      600 records          2.462239   0.053800   2.516039 (  2.567483)
      700 records          2.931670   0.101191   3.032861 (  3.090854)
      800 records          5.577336   0.141814   5.719150 (  5.808348)
      900 records          6.541524   0.164061   6.705585 (  6.797809)
     1000 records          6.872795   0.165147   7.037942 (  7.127721)
     2000 records         21.035475   0.488726  21.524201 ( 21.690226)
     3000 records         50.627194   1.559264  52.186458 ( 52.466575)
     4000 records         78.586900   3.386539  81.973439 ( 82.373508)
     5000 records        122.841316   4.818588 127.659904 (128.212117)
    ===================================================================
    
    </details>
    

Data characteristics

Raw SQL

https://paste.depesz.com/s/w9u

WITH package_versions_cte ( purl_type, name, version ) AS (
    VALUES
        ( 1, 'moodle/moodle', '3.5.17' ),
        ( 6, 'url-parse', '1.5.1' ),
        ...
        ( 5, 'org.apache.ant/ant', '1.9.0' ),
)
SELECT DISTINCT
    package_versions_cte.version,
    published_date,
    title,
    description,
    cvss_v2,
    cvss_v3,
    urls,
    identifiers,
    pm_affected_packages.purl_type,
    pm_affected_packages.package_name,
    pm_affected_packages.solution,
    pm_affected_packages.affected_range
FROM
    pm_affected_packages
    JOIN pm_advisories ON pm_advisories.id = pm_affected_packages.pm_advisory_id
    JOIN package_versions_cte ON package_versions_cte.purl_type = pm_affected_packages.purl_type AND
    package_versions_cte.name = pm_affected_packages.package_name
WHERE
    (
        pm_affected_packages.purl_type,
        pm_affected_packages.package_name
    ) IN (
        ( 1, 'moodle/moodle' ),
        ( 6, 'url-parse' ),
        ...
        ( 5, 'org.apache.ant/ant' ),
    );

Query plan

699 random records where every record has an affected package

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #371055 (closed)

Edited by Adam Cohen

Merge request reports

Loading