Package metadata sync service reads from GCP bucket named after the PURL type
Summary
When getting package metadata from License DB using the GCP connector, the sync service reads from GCP Buckets named after the PURL types, like v1/gem
. However, the GCP Buckets created by License DB are named after the registry IDs, like v1/rubygem
. Because of this mistmatch, the backend can't get any data for the following PURL types: composer
, gem
, and golang
.
See !111242 (comment 1272915182)
PURL type | Registry ID | Same? |
---|---|---|
conan | conan | |
gem | rubygem | |
golang | go | |
maven | maven | |
npm | npm | |
nuget | nuget | |
composer | packagist | |
pypi | pypi |
The GCP Connector was implemented as part of #383797 (closed).
At the moment the bug doesn't have any impact because the sync has not been able on gitlab.com, or on any GitLab instance. See [Feature flag] Rollout of package_metadata_sync... (#390836 - closed)
Proposal
- Introduce a map that give the registry ID for any given PURL type.
- Use that map in the GCP connector to
Implementation plan
-
Introduce a map that gives the registry ID for any given PURL type. https://gitlab.com/gitlab-org/gitlab/-/blob/504e17ae6952fef5475a2b3c349a34b1e52cba82/ee/app/models/package_metadata/sync_configuration.rb#L8 -
Use that map in the GCP connector to get the prefix of the GCP bucket. !106885 (diffs, comment 1280036140) -
Add specs to check that the connector gets files from the correct GCP Bucket. !106885 (diffs, comment 1280023927)
/cc @hacks4oats @ifrenkel
Edited by Oscar Tovar