Investigate adding NuGet to the GitLab Package Registry
Problem to solve
The goal of the Package group is to build a set of features that, within three years, will allow ninety percent of our customers to store all of their packages in GitLab. In order to achieve that goal, we must provide support for the most commonly used package managers for our users. This ever-growing list requires that we are able to add support for new formats in a timely manner.
As a team, we have decided that in order to de-risk this work that we need time for engineers to investigate the documentation, APIs and overall feasibility of adding support for a new format.
.NET developers need a mechanism to create, share, and consume packages that contain compiled code and other content in projects that consume these packages. For .NET, the Microsoft-supported mechanism for sharing code is NuGet, which defines how packages for .NET are created, hosted, and consumed, and provides the tools for each of those roles.
This issue is to investigate and familiarize ourselves with the NuGet and .NET documentation so that we can streamline the development process in the subsequent milestone.
Proposal
Review the NuGet MVC proposal and corresponding NuGet/.NET documentation to better understand the requirements and functionality. Go through the process of installing and using NuGet to gain a technical understanding of the effort involved so we have a clear path forward for development.
This work should be time-boxed into no more than a few days. If after a few days the package manager still has a high degree of unknowns, it should be noted and re-evaluated rather than consuming more time. The engineer doing this investigation should share their technical notes and findings in this issue so the engineer(s) that work on the implementation don't have to go through the same process of discovering new information.
Further details
- One option that we are considering as an MVC is to only show the feed for users at the project level, so they can better understand which packages are being used, but not necessarily have their own NuGet registry hosted on GitLab.
NuGet Commands (MVC)
Configuration and Authentication
-
nuget config
: Gets or sets NuGet configuration values. The user will configure NuGet based on GitLab documentation -
nuget setapikey
: Saves an API key for a given package source when that package source requires a key for access. The user will use their GitLab personal access token for authentication.
Create, publish and consume
-
nuget pack
: Creates a NuGet package from a .nuspec or project file. When running on Mono, creating a package from a project file is not supported. -
nuget push
: Publishes a package to a package source. -
nuget list
: Displays packages from a given source. -
nuget delete
: Removes or unlists a package from a package source.
NuGet Commands (Beyond the MVC)
-
nuget spec
: Generates a .nuspec file, using tokens if generating the file from a Visual Studio project. -
nuget update
: Updates a project's packages to the latest available versions. Not supported when running on Mono. -
nuget restore
: Restores all packages referenced by the package management format in use. When running on Mono, restoring packages using the PackageReference format is not supported.
Competitor Examples
Links / references
- Microsoft NuGet Docs
- NuGet v3 API
- https://api.nuget.org/v3/index.json
- Mono (for running nuget.exe on mac/linux) ries)
Investigation Findings:
- Vocabulary: "Feed" seems to be used in different way but seems to always be what we call the package registry in GitLab. The
nuget
CLI use the term "Source" - Different clients but all will call
nuget
for main functions. Also thenuget
configuration file is used across the different clients. Example: I added my local dummy Source tonuget
and then I opened Visual Studio. My local Source was properly listed as a NuGet Feed. - Packages are searched across all Sources/Feeds or the pull command can be scoped to a specific Source/Feed.
- The NuGet Server API defines 4+1 services as required: (services as in "nuget services". Nuget services can handle 1 or many urls)
- The Service Index
- The Search Service
- The Push/Delete Service
- The Package Metadata Service
- The Package Content Service
- A total of 9 requests have to be handled. One seems to be optional.
- This translates to 9 Grape API endpoints to implement.
Authentication
- The main authentication mechanism is a http basic auth but the Source/Feed has to be under https.
- Should be compatible with GitLab username + token.
- For the Push Service an additional authentication mechanism is available: an API token. This token is sent along the push request as a custom http header:
X-NuGet-ApiKey
. This could be used for the CI_JOB_TOKEN (see below).
Variables used in this comment
For a given package MyCompany.MyPackage.1.0.0Beta.nupkg
, here are what the API docs uses:
-
api_base_url
: The base url. For this API, it should be something like/api/v4/projects/40/packages/nuget
-
XX_service_url
: The service url for service XXX as returned by the Service Index. -
ID
:MyCompany.MyPackage
-
Version
:1.0.0Beta
-
LOWER_ID
:mycompany.mypackage
-
LOWER_VERSION
:1.0.0beta
The Service Index
- https://docs.microsoft.com/en-us/nuget/api/service-index
- Used as entry point when a private Source/Feed is added.
- Aggressively cached by clients
GET {api_base_url}/index.json
- Simple json describing which services are available and what are their urls.
The Search Service
- https://docs.microsoft.com/en-us/nuget/api/search-query-service-resource
- Used by Visual Studio. (search not available in
nuget
)
GET {search_service_url}?q={QUERY}&skip={SKIP}&take={TAKE}&prerelease={PRERELEASE}&semVerLevel={SEMVERLEVEL}
- Parameters
-
q
the search term. Server implementation is free to apply this filter on any field. Example: the search term can be applied to the name or the name and the description of a package, etc. -
skip
pagination parameter. This is the offset. Default to 0 -
take
pagination parameter. This is the number of entries per page. Server implementation can impose a max value -
prerelease
to include prerelease packages or not -
semVerLevel
to include SemVer 2.0.0 packages
-
The Push/Delete Service
- https://docs.microsoft.com/en-us/nuget/api/package-publish-resource
- Can be authenticated through a custom http header.
PUT {push_service_url}
- The NuGet (compressed) archive is sent without any additional parameters.
-
⚠ This means that this request has no metadata available. In short, we get a compressed archive and that's it. -
⚠ The backend will need to extract and inspect this compressed archive. Since this can take some time to do, this extraction should be done in a background job.
Log excerpt:
$ nuget push Bananas.1.0.0.nupkg -Source "local"
WARNING: No API Key was provided and no API Key could be found for 'http://localhost:4000/api/v4/projects/40/packages/nuget'. To save an API Key for a source use the 'setApiKey' command.
Pushing Bananas.1.0.0.nupkg to 'http://localhost:4000/api/v4/projects/40/packages/nuget'...
PUT http://localhost:4000/api/v4/projects/40/packages/nuget/
Created http://localhost:4000/api/v4/projects/40/packages/nuget/ 38ms
Your package was pushed.
DELETE {push_service_url}/{ID}/{VERSION}
- Interpretation left to the server implementation
- NuGet.org and github unlists the package but archives are still available for pulling.
- Can be a hard delete.
⚠ This can easily break project builds.
POST {push_service_url}/{ID}/{VERSION}
- Relist a previously unlisted package
- No body request
The Package Metadata Service
- https://docs.microsoft.com/en-us/nuget/api/registration-base-url-resource
- The most complex endpoint
- Not only metadata is listed but also
⚠ dependencies
GET {metadata_service_url}/{LOWER_ID}/index.json
- bulk listing of all metadata for all available versions
- the json for the metadata is quite complex (https://docs.microsoft.com/en-us/nuget/api/registration-base-url-resource#sample-response-1)
- can be broken into pages. (I didn't find an example for this :/)
- Uses the
LOWER_ID
andLOWER_VERSION
GET {metadata_service_url}/{LOWER_ID}/{VERSION}.json
- similar response as above but for a given version
- undocumented but implemented by github
- not sure that this url is used by
nuget
or Visual Studio (I didn't see this request fired)
The Package Content Service
- https://docs.microsoft.com/en-us/nuget/api/package-base-address-resource
- Service to get the archive but also some metadata (see below)
- Uses the
LOWER_ID
andLOWER_VERSION
GET {content_service_url}/{LOWER_ID}/index.json
- Simple json to with a single array: all available versions.
GET {content_service_url}/{LOWER_ID}/{LOWER_VERSION}/{LOWER_ID}.{LOWER_VERSION}.nupkg
- The archive of the given package
- Yes LOWER_ID and LOWER_VERSION are duplicated in the url
🤔
GET {content_service_url}/{LOWER_ID}/{LOWER_VERSION}/{LOWER_ID}.nuspec
- The
nuspec
file of the archive - Not called during my tests but documented
- Not implemented by github (404 Not Found)
Proposed MVC Scope
- Being able to add a project NuGet registry url to
nuget
in an authenticated manner using GitLab username + token. - Being able to push a NuGet package (
nuget push
) - Being able to pull a NuGet package (
nuget install
) - Being able to delete a NuGet package (
nuget delete
) - Ideally no changes on the FE side for the first version
The 5 endpoints should be enough to implement the MVC Scope for the nuget
, dotnet
and Visual Studio clients.
Naming convention
Since the MVC scope is for the project level, I don't think we don't need a naming convention. For future versions (if we implement the instance level access for example), we can enforce the use of .
as a separator to identify a Group: MyGroup.MyPackage
.
Pain points
- The metadata of a package will need to be available in the database as it is heavily used by the Package Metadata Service.
- The upload is a simple archive upload. No parameters about the metadata.
- It looks like we will need to open the archive and extract the metadata to have it available within the GitLab database. This should be done in a background job.
- Good news: the dependency models implemented in !14263 (closed) can be reused.
- The Search Service uses a custom pagination parameters set. Those will need to be translated into what we use in the code.
Opportunities
- We could implement this as an internal API as suggested in #35798
- The Metadata extraction job could be implemented in a generic way to be easily ported to other package types.
Proposed MRs breakdown
- API Skeleton + The Service Index + authentication + feature flag (weight 1)
- Workhorse updates to handle the upload url (weight 1)
- The Push/Delete Service (weight 2)
- Nuget package metadata extraction job + db changes to support metadata and dependencies information (weight 3)
- The Package Metadata Service (weight 2)
- The Search Service (weight 2)
- The Package Content Service (weight 1)
The critical path would be: 1. 2. -> 3. -> 4. 5. 6. 7.
Work estimation
From my initial estimation:
- I don't think that it's feasible one milestone with only 1 BE.
- With 2 BEs, it seems doable but tight.
- Looking at the critical path above, we see that we will need at least the Service index and the upload MRs merged (MRs 1. 2. and 3.) to be able to parallelize the remaining work.
- Since there are still some unknowns (for example, the db table design to properly store the metadata), a feature flag would be a wise path (see Questions).
Test Scenarios
- Add Feed, push and pull from
nuget
(macOS + Windows) - Add Feed, pull and search from Visual Studio (macOS + Windows)
- Push package from
dotnet command
(macOS + Windows) - (Bonus: have a quick check if chocolatey packages work with the MVC (Windows))
I should be able to test the API endpoints on my Windows machine if necessary.
Future changes
- CI_JOB_TOKEN job authentication so that a CI job can push a package
- FE extracted metadata + dependencies integration (this might need some BE effort on the packages API)
- Metadata extraction job can lead to several interesting features:
- Files within the archive available for scanners (Vulnerabilites, Virus Scanner, others)
- Open this extraction to other packages type. Example, what if we expose the dependencies of an NPM package in the UI?
References
- https://docs.microsoft.com/en-us/nuget/api/overview
- https://www.nuget.org
- UX on nuget for a given package (for @icamacho): https://www.nuget.org/packages/Castle.Core/
- Dummy server to inspect NuGet requests locally: https://gitlab.com/10io/nuget-api-server-faker
From the chocolatey demo
- https://chocolatey.org
- UX on chocolatey for a given package (for @icamacho): https://chocolatey.org/packages/Firefox (notice the
Virus Scan Results
) - Several users mentioned Sleet: https://github.com/emgarten/Sleet
- Takes a collection of nupkg files, organize them and generate static json files for the endpoints.
- With this structure of files, users can then host them on Azure or S3 and thus have a serverless feed.