Improve performance of PackageLicenses#fetch
What does this MR do and why?
This MR updates Gitlab::LicenseScanning::PackageLicenses#fetch to implement the optimization described here.
Data characteristics
Raw SQL
-
WITH needed_package_versions ( purl_type, name, version ) AS ( VALUES ( 4, 'beego', 'v1.10.0' ), ( 6, 'camelcase', '1.2.1' ), ( 6, 'camelcase', '4.1.0' ), ( 6, 'cliui', '2.1.0' ), ( 4, 'cliui', '2.1.0' ) ) SELECT DISTINCT pm_packages.id, needed_package_versions.version FROM needed_package_versions JOIN pm_packages ON pm_packages.purl_type = needed_package_versions.purl_type AND pm_packages.name = needed_package_versions.name;
-
query 2, using the
pm_packages.id, needed_package_versions.version
values fromquery 1
above:SELECT pm_packages.purl_type, pm_packages.name, pm_package_versions.version, array_agg( pm_licenses.spdx_identifier ) FROM pm_packages JOIN pm_package_versions ON pm_package_versions.pm_package_id = pm_packages.id JOIN pm_package_version_licenses ON pm_package_version_licenses.pm_package_version_id = pm_package_versions.id JOIN pm_licenses ON pm_licenses.id = pm_package_version_licenses.pm_license_id WHERE ( pm_package_versions.pm_package_id, pm_package_versions.version ) IN ( ( 386, 'v1.10.0' ), ( 387, '1.2.1' ), ( 387, '4.1.0' ), ( 388, '2.1.0' ), ( 389, '2.1.0' ) ) GROUP BY pm_packages.purl_type, pm_packages.name, pm_package_versions.version;
Query plan and benchmarks
Please see this internal note for query plan and this comment for further details.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #398679 (closed)
Edited by Adam Cohen