Can't publish large NuGet packages
Summary
Nuget packages larger than 500MB
silently fail to be uploaded to the GitLab.com package registry.
🦀 Context
Nuget packages are uploaded as simple zip files called package.nupkg
. The GitLab package registry needs to have at least the package name and version to make it available for pulling.
To achieve that, once a nuget package is uploaded, we enqueue a background job (called Packages::Nuget::ExtractionWorker
) that will pull the package file, open the zip archive and read the *.nuspec
file to extract among other things, the package name and version.
The background job will then either:
- find an existing package with that name and version and append the package file
- update the package with the final name and version
Both of these operations will make the package file be moved in object storage. This is due to how the path of a package file is computed: we use the package id
, the package file id
and the filename of the archive. Updating one of those will require a move within object storage.
Proposal
- Download the file for the metadata extraction. That's the current
#use_open_file
usage. We will still need this. - Update the metadata but don't download the file twice. Instead copy it over its new key.
- Destroy the old key.
Further details
📊 MR plan
Those changes are quite deep and change how the background job process nuget packages. Given the depth of this change, we will need to use a feature flag as an additional safety net.
The changes can be in a single MR.
🔭 Things to consider
During the MR implementation, we will need to test different conditions to make sure that this change works as expected:
Object storage | Existing nuget package |
---|---|
disabled | yes |
disabled | no |
GCP | yes |
GCP | no |
AWS | yes |
AWS | no |
In particular, we saw that if the nuget package already exists, the DELETE request fails and the original file is left behind. We will need to fix that.
Lastly, this need of moving a file within object storage is a common need in package background processing. To make it re-usable, it's advised to create a service specifically for that.
Relevant logs and/or screenshots
https://gitlab.com/immersaview/public/packages/chromium-embedded-framework/-/jobs/1025547697
- Snippet of log
pushd NuGet
/builds/immersaview/public/packages/chromium-embedded-framework/NuGet /builds/immersaview/public/packages/chromium-embedded-framework
$ dotnet nuget add source "$CI_SERVER_URL/api/v4/projects/$CI_PROJECT_ID/packages/nuget/index.json" --name gitlab --username gitlab-ci-token --password $CI_JOB_TOKEN --store-password-in-clear-text
Package source with Name: gitlab added successfully.
$ dotnet nuget push *.nupkg --source gitlab
warn : No API Key was provided and no API Key could be found for 'https://gitlab.com/api/v4/projects/24280473/packages/nuget'. To save an API Key for a source use the 'setApiKey' command.
Pushing Imv.External.chromium-embedded-framework.3.3359.1774.20191217.nupkg to 'https://gitlab.com/api/v4/projects/24280473/packages/nuget'...
PUT https://gitlab.com/api/v4/projects/24280473/packages/nuget/
Created https://gitlab.com/api/v4/projects/24280473/packages/nuget/ 26834ms
Your package was pushed.
$ popd
Output of checks
This bug happens on GitLab.com