Improve performance of package license query to re-enable SBOM based license scanning
Summary
The query to fetch package licenses is exhausts resources. From March 15 this query is running on replica nodes, but this is not enough to make this part of the code production ready.
We need to investigate and improve the performance of Gitlab::LicenseCompliance::SbomScanner#report
.
Right now this issue is preventing us from enabling feature on production.
Further information
Please check the checkup report on replicas. You can see that the issue is coming from Ci::SyncReportsToReportApprovalRulesWorker
. Most probably, because users interact the most with MR page and this sync approvals worker is triggered even if the change isn't related to the license compliance feature. The particular method is triggered in the sync service and then it is executed in the report method.
Logs
- SBOM License Scanning Monitoring Dashboard (internal link)
Implementation Plan
-
Replace the single query in Gitlab::LicenseScanning::PackageLicenses#fetch with 2 faster queries to optimize the index scan: : -
The first query will use a common table expression (
CTE
) to get the list ofpm_packages.id
andversion
values that correspond to the givencomponents
:WITH needed_package_versions ( purl_type, name, version ) AS ( VALUES ( 4, 'beego', 'v1.10.0' ), ( 6, 'camelcase', '1.2.1' ), ( 6, 'camelcase', '4.1.0' ), ( 6, 'cliui', '2.1.0' ), ( 4, 'cliui', '2.1.0' ) ) SELECT DISTINCT pm_packages.id, needed_package_versions.version FROM needed_package_versions JOIN pm_packages ON pm_packages.purl_type = needed_package_versions.purl_type AND pm_packages.name = needed_package_versions.name;
-
The second query will feed the above
pm_packages.id
andversion
values into a revised form of the original query, now filtering bypm_packages.id
values instead of package names, so that thepm_package_versions
index can be scanned in one pass:SELECT pm_packages.purl_type, pm_packages.name, pm_package_versions.version, array_agg( pm_licenses.spdx_identifier ) FROM pm_packages JOIN pm_package_versions ON pm_package_versions.pm_package_id = pm_packages.id JOIN pm_package_version_licenses ON pm_package_version_licenses.pm_package_version_id = pm_package_versions.id JOIN pm_licenses ON pm_licenses.id = pm_package_version_licenses.pm_license_id WHERE ( pm_package_versions.pm_package_id, pm_package_versions.version ) IN ( ( 386, 'v1.10.0' ), ( 387, '1.2.1' ), ( 387, '4.1.0' ), ( 388, '2.1.0' ), ( 389, '2.1.0' ) ) GROUP BY pm_packages.purl_type, pm_packages.name, pm_package_versions.version;
-
-
Fix any tests broken by the above changes. -
Verify performance in staging and production environments. -
Re-enable the license_scanning_sbom_scanner
feature flag.
Workarounds
The recommended workaround is to re-include the license scanning template.
# .gitlab-ci.yml or equivalent
include:
- template: Security/License-Comliance.gitlab-ci.yml