Add service to handle disallowing duplicate NuGet package uploads
What does this MR do and why?
In !123783 (merged), a setting for allowing/disallowing duplicate NuGet package uploads has been added. In this MR, I utilize the setting in Packages::Nuget::FindOrCreatePackageService
to put that in action. Changes are behind the nuget_duplicates_option
feature flag. If the feature flag is disabled for the project's namespace, then the behavior is the default one which is allowing duplicates.
How Nuget upload works:
A NuGet package is a compressed file with the extension .nupkg
or .snupkg
(for symbols). When this file is pushed to GitLab, it contains metadata stored within a .nuspec
file that is embedded in the compressed package. To retrieve the package name
and version
, the .nupkg
file needs to be unzipped, and the relevant data extracted from the .nuspec
file.
This unzipping process is handled by a background worker to ensure speedy publishing. As a result, users would receive an acknowledgment that the package was created, even though it was still being processed and published. Any errors that occurred during the background process are visible on the package registry UI page, allowing users to identify and rectify them.
Now, we aim to introduce a feature that allows users to prevent the publishing of duplicate packages. This feature should operate synchronously, meaning the client (NuGet, dotnet, Visual Studio) should receive a 409
status code (Conflict) if an attempt is made to publish a duplicate version. To achieve this, we need to handle the file unzipping synchronously, rather than using the background worker, as we cannot determine the package name
and version
until they are extracted from the .nuspec
file. These extracted values are then used to check for duplicate packages. The package is considered a duplicate if its name
& version
match the name
& version
of a published package in the same project.
To summarize:
-
We need to extract the
.nuspec
file from the package file synchronously in order to get the packagename
&version
. To achieve that efficiently, especially for large-size packages, we can handle thezip
archive in a stream "mode"; meaning we don't download the whole.nupkg
file from the object store; alternatively we fire a streaming request and fetch the file in chunks. Each small fetched chunk can be unzipped and once the needed.nuspec
file is found, we extract it and stop streaming. The.nuspec
file is commonly located at the top level of the archive so it should be fetched within the first 2 chunks (tested with different-sized packages). If we reached 5 downloaded chunks without finding the.nuspec
file, we stop streaming and respond with an error:nuspec file not found
. -
Step
1.
is executed only when the user disallows duplicate package uploads. If the setting istrue
(allowing duplicates), the entire publishing process is performed in the background worker, as before. -
This new setting does not affect symbol packages; they are handled as before. Symbols are attached to existing matching
.nupkg
packages. If no matching package exists, the symbols are not published. -
The
--skip-duplicate
option should work out of the box, as we now respond with a409
status code (Conflict) in the case of duplication. The client (NuGet cli, dotnet cli) can then proceed with the next package in the push, if any, ignoring those that failed to be published due to duplication.
Implementation Details
- Add two new columns
nuget_duplicates_allowed
&nuget_duplicate_exception_regex
to thenamespace_package_settings
table. The default is the current behavior which allows duplicates. (Done in !123783 (merged)) - Make them updatable by GraphQL, but not added yet to the UI; this should be done in a separate MR for the next milestone. (Done in !123783 (merged))
- Introduce a new service
Packages::Nuget::FindOrCreatePackageService
which should check for duplication (if needed) then callExtractionWorker
to create the package and the package file. - Introduce a new service
Packages::Nuget::ExtractRemoteMetadataFileService
which is responsible for the zip streaming request of the package file. - Ensure we don't unzip the package file twice if we already checked for duplication.
How to set up and validate locally
-
Ensure you have the NuGet CLI installed (see nuget docs for links to installation pages).
-
Ensure the object store is enabled in your gdk.
-
In a new directory, run
nuget spec
. A file namedPackage.nuspec
should be created. -
Run
nuget pack
. A file namedPackage.nupkg
should be created. -
Add a GitLab project as your NuGet source:
nuget source Add -Name localhost -Source "http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/index.json" -UserName <gitlab_username> -Password <personal_access_token>
-
Push the package to your project:
nuget push Package.1.0.0.nupkg -Source localhost
-
After the package is successfully published, clear the local NuGet cache
nuget locals all -clear
-
In the rails console, enable the
nuget_duplicates_option
feature flag for the namespace of the project:
Feature.enable(:nuget_duplicates_option, Namespace.find(<namespace_id>))
- Update the namespace package settings
nuget_duplicates_allowed
using the query below in graphql-explorer:
mutation {
updateNamespacePackageSettings(input: {
namespacePath: "<your-namespace-full-path>",
nugetDuplicatesAllowed:false,
}) {
packageSettings {
nugetDuplicatesAllowed
}
}
}
- Try to publish the same package again. You should see a 409 response from the server:
Pushing Package.1.0.0.nupkg to 'http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget'...
PUT http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/
Conflict http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/ 6367ms
To skip already published packages, use the option -SkipDuplicate
Response status code does not indicate success: 409 (Conflict).
- Update
nuget_duplicates_allowed
to betrue
and try to publish the same package. It should be successfully published.
Test the exception regex:
- Update the package settings as below. The regex ".-be." would allow only duplicate packages whose name or version matches the regex.
mutation {
updateNamespacePackageSettings(input: {
namespacePath: "<your-namespace-full-path>",
nugetDuplicatesAllowed:false,
nugetDuplicateExceptionRegex: ".*-be.*"
}) {
packageSettings {
nugetDuplicatesAllowed
nugetDuplicateExceptionRegex
}
}
}
- Edit the field in file
Package.nuspec
from step 2. and make it2.0.0-beta
for example then runnuget pack
and publish the generated.nupkg
file. - Publish the same package again. It should be published successfully because version
2.0.0-beta
matches the regex.*-be.*
.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
💾 Database analysis
Related to #293748 (closed)