Duplicate settings might not work on nuget uploads that have a nuspec file at the end of the archive
🔥 Problem
When a NuGet package is uploaded to GitLab, we don't receive any information about the package name or the version.
The problem is that we need that for duplicates settings and the only way is to read the uploaded zip
file.
In !128269 (merged), we implemented a duplicates check during the upload logic. We managed to implement that without reading the entire zip
archive. Instead, we read the first few bytes to get the first few entries. The .nuspec
file being among the first entries, that was enough to get the package name + version and enforce duplicates settings.
In 2024-11-25: Can't upload nuget packages when nu... (gitlab-com/gl-infra/production#18894 - closed), we stumbled upon NuGet packages where the .nuspec
file was not located at the beginning of the archive but elsewhere.
Thus, for packages with the .nuspec
file not at the beginning, the duplicates logic can't be enforced when the package is uploaded.
🔮 Other aspects
As reported here, the .nuspec
file is not consistently put at the beginning of the archive. We might have some differences between the OS, in particular, on runners.
🚒 Solution
First, we need to note from 2024-11-25: Can't upload nuget packages when nu... (gitlab-com/gl-infra/production#18894 - closed) that the amount of uploads that don't have .nuspec
files at the beginning of the .zip
archive is very low. In other words, the current logic works for the majority of the NuGet uploads.
Having said that, we should still support these uploads and enforce the duplicates logic properly.
As stated, for these uploads, we can't enforce the duplicate logic during the upload (the .nuspec
file is not in the first bytes we read from the archive). The only possible solution is thus to run the duplicates logic in the background process that handles nuget uploads.
For optimization purpose, the background job could receive an indication if the duplicates check has been already executed during the upload or not. This way, the job doesn't need to run the duplicates check all the time. Not sure that this is worth it, since we're in a background worker and we're going to extract the package name + version anyway. Thus, comparing them against a set of regex (even if it is done twice) or not is not a huge saving.
Be sure to add specs that will use a .zip
nuget package where the .nuspec
file is at the end of the archive.