Update package metadata license lookup to use deduplicated data
Why are we doing this work
A new compressed dataset will be added as part of this issue's parent epic. The compressed dataset is meant to replace the existing dataset (though not in this epic). Therefore, the current license lookup provided by Gitlab::LicenseScanning::PackageLicenses needs to be updated to query this dataset.
Note: until existing dataset is removed, Gitlab::LicenseScanning::PackageLicenses
we need to support querying both compressed
and uncompressed
data.
Relevant links
- Research spike: #407454 (closed)
- Detailed discussion of
licenses
data structure: #408901 (closed)
Implementation plan
Update Gitlab::LicenseScanning::PackageLicenses
:
-
Add a new feature flag called compressed_package_metadata_query
.[Feature flag] Rollout of `compressed_package_m... (#409793 - closed)
-
Create two new private methods: -
uncompressed_fetch
This contains the code currently in the fetch method
-
compressed_fetch
This is responsible for querying data from the new
licenses
field in thepm_packages
table usingcomponents
using the following pseudocode:def compressed_fetch components.each do |component| packages = select packages from pm_packages table where pm_packages.name = component.name and pm_packages.purl_type = component.purl_type packages.each do |package| licenses = [] if component.version is contained in package.licenses.other_versions licenses = package.licenses.other_licenses else licenses = package.licenses.default_licenses end add_record_with_known_licenses(package.purl_type, package.name, component.version, licenses) end end add_records_with_unknown_licenses end
Update package metadata license lookup to use c... (!119607 - merged)
-
-
update fetch: if the `compressed_package_metadata_query` feature is enabled call `compressed_fetch` else call `uncompressed_fetch` end
Update package metadata license lookup to use c... (!119607 - merged)
-
Update all tests to check both sides of compressed_package_metadata_query
feature flag:Test both sides of compressed_package_metadata_... (!120207 - merged)
Verification steps
Test data to be determined.
- redundant case
- insert same test data into both datasets
- assert that
uncompressed_fetch
andcompressed_fetch
return the same result, regardless of the setting forcompressed_package_metadata_query
- new instance case
- insert only into new dataset
- assert that
fetch
returns data correctly whencompressed_package_metadata_query
is enabled