Add nuget dependencies support to the metadata extraction
What does this MR do?
Add dependencies extraction to the GitLab NuGet repository (https://docs.gitlab.com/ee/user/packages/nuget_repository/)
See #42680 (closed)
Details
NuGet packages are uploaded as a .nupkg
file (actually a zip
archive file). This file is then processed by a worker (Packages::Nuget::ExtractionWorker
) to get the package name and version from the .nuspec
file. This MR will add dependencies to this extraction.
The .nuspec
file is an XML file (documented here: https://docs.microsoft.com/en-us/nuget/reference/nuspec) that is standardized by an XSD schema: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/compiler/resources/nuspec.xsd
Reading the XSD, we can see that the .nuspec
file has a <dependencies>
tag and its children have two forms. Here is an example:
<dependencies>
<dependency id="Moqi" version="2.5.6" include="Runtime,Compile" />
<group targetFramework=".NETStandard2.0">
<dependency id="Test.Dependency" version="2.3.7" exclude="Build,Analyzers" include="Runtime,Compile" />
<dependency id="Newtonsoft.Json" version="12.0.3" exclude="Build,Analyzers" />
</group>
<dependency id="Castle.Core" />
</dependencies>
In short, we either have a <dependency>
element or a <group>
with many <dependency>
elements embedded. Note that we will need to extract not only the dependency name and version but also the include
, exclude
and targetFramework
attributes.
Inspecting the XSD file, we notice that these attributes are all optional, even version
is optional. In particular, this statement:
<dependencies>
<dependency id="Moqi" />
</dependencies>
is valid.
Persisting dependencies in GitLab
We already have Packages::DependencyLink
and Packages::Dependency
that allow us to persist package dependencies. The only issue here is that these models will only handle the dependency name and version.
To be able to support the include
, exclude
and targetFramework
attributes, a new model/table has been added and linked to Packages::DependencyLink
: Packages::NugetDependencyLinkMetadatum
. This new model follows what has been done at the Packages::Package
level (see Packages::MavenMetadatum
for an example).
Now we need to integrate this Packages::NugetDependencyLinkMetadatum
during dependencies creation. This creation is handled by the Packages::CreateDependencyService
. This service has some specificity: all the inserts are grouped and done through bulk inserts. This is to avoid n
inserts, n
being the number of the dependencies for the given package. Also, if a Packages::Dependency
is re used accross packages, the Packages::Dependency
row is re-used (for more details about this service, check !20549 (merged)).
We want to keep this same behavior (use bulk inserts) for Packages::NugetDependencyLinkMetadatum
. To deal with that, we introduce a new nuget flavoured service: Packages::Nuget::CreateDependencyService
. This service will use the shared service Packages::CreateDependencyService
to insert Packages::DependencyLink
s and Packages::Dependency
s.
Once, we have the Packages::DependencyLink
s, it's just a matter of looping through them, locate the nuget metadata (the include
, exclude
and targetFramework
attributes) and build a rows
structure. This structure is then used to bulk insert all the rows at once.
Follow ups
There are a few aspects that this MR doesn't tackle to keep it small:
- Those dependencies are persisted by this MR but we need to do something with them
🙂 . We will update thenuget metadata endpoint
to return them. Currently, it returns an empty array (see https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/api/helpers/packages/nuget/metadata_presenter_helpers.rb#L48) and this leads to issues under specific conditions, see #211926 (closed) - The upload endpoint accepts multiple uploads for the same package name + version. The GitLab NuGet repository will simply append the newest files to the existing
Packages::Package
. This part has not been updated to support dependencies changes. In short, imagine that we push for packagep
with dependencies ona
,b
andc
. Later on, we push the same package name + version but the.nuspec
file references only dependenciesa
andc
-> Dependencyb
has to be cleaned up. A follow up issue has been created: #216005.
Screenshots
On the nuget
CLI:
nuget push Dependencies.App.4.2.3.nupkg -source local
WARNING: No API Key was provided and no API Key could be found for 'http://gitlab.local:8000/api/v4/projects/1/packages/nuget'. To save an API Key for a source use the 'setApiKey' command.
Pushing Dependencies.App.4.2.3.nupkg to 'http://gitlab.local:8000/api/v4/projects/1/packages/nuget'...
PUT http://gitlab.local:8000/api/v4/projects/1/packages/nuget/
Created http://gitlab.local:8000/api/v4/projects/1/packages/nuget/ 35479ms
Your package was pushed.
After the worker has gone through the package, we get this:
> Packages::Package.last.dependency_links
=> [#<Packages::DependencyLink:0x00007fb0ab3057b0 id: 112, package_id: 244, dependency_id: 44, dependency_type: "dependencies">,
#<Packages::DependencyLink:0x00007fb0ab4db490 id: 113, package_id: 244, dependency_id: 45, dependency_type: "dependencies">,
#<Packages::DependencyLink:0x00007fb0ab4db2b0 id: 114, package_id: 244, dependency_id: 46, dependency_type: "dependencies">]
> Packages::Package.last.dependency_links.first.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa4a50f8 dependency_link_id: 112, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">
> Packages::Package.last.dependency_links.second.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa344ad8 dependency_link_id: 113, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">
> Packages::Package.last.dependency_links.third.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa37f160 dependency_link_id: 114, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">
Database review
Migration Up
$ rails db:migrate
== 20200424135319 CreateNugetDependencyLinkMetadata: migrating ================
-- table_exists?(:packages_nuget_dependency_link_metadata)
-> 0.0005s
-- create_table(:packages_nuget_dependency_link_metadata, {:id=>false})
-> 0.0228s
-- transaction_open?()
-> 0.0000s
-- execute("ALTER TABLE packages_nuget_dependency_link_metadata\nADD CONSTRAINT packages_nuget_dependency_link_metadata_target_framework_constraint\nCHECK ( char_length(target_framework) <= 255 )\nNOT VALID;\n")
-> 0.0008s
-- execute("SET statement_timeout TO 0")
-> 0.0001s
-- execute("ALTER TABLE packages_nuget_dependency_link_metadata VALIDATE CONSTRAINT packages_nuget_dependency_link_metadata_target_framework_constraint;")
-> 0.0004s
-- execute("RESET ALL")
-> 0.0001s
== 20200424135319 CreateNugetDependencyLinkMetadata: migrated (0.0294s) =======
Migration Down
$ rails db:rollback
== 20200424135319 CreateNugetDependencyLinkMetadata: reverting ================
-- drop_table(:packages_nuget_dependency_link_metadata)
-> 0.0092s
== 20200424135319 CreateNugetDependencyLinkMetadata: reverted (0.0093s) =======
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team