NuGet - The Search Service
What does this MR do?
Implements the nuget search service API. See https://docs.microsoft.com/en-us/nuget/api/search-query-service-resource.
This API is meant to be used by Visual Studio.
See #36505 (closed)
This MR is part of the nuget MVC (See the epic &2271 (closed)). This MR is gated behind this feature flag (scoped by project): nuget_package_registry
. This MR doesn't need documentation or a change log entry.
Technical notes
- The search API(used by Visual Studio) doesn't match how GitLab represent packages.
- The search API expects to return a json structure as an array:
- Each element is a package with its name and several other attributes
- For each element, there is a
versions
array which holds all the available versions.
- This structure is paginated, meaning that clients will send some parameters indicating which page the API should return.
- Note that the pagination is on the package names, not the versions.
- On GitLab, each package name + version is a
Packages::Package
model instance.- In short, we have one row/element per package name + version combination.
- For these reasons, the search query is implemented as custom SQL query.
- We don't want to run a query for package names and for each result, run a query to get the available versions. This would lead to 1 + n queries where n is the page size.
- This MR uses the approach where a single query fetches everything (name + versions) in one single query. This is possible with SQL using
PARTITION
. - To avoid ~performance issues, this query has hard limits (see below).
- At the presenter level, we will need to restructure this result into a proper structure expected by the API entities. This means that the presenter will need to execute several
.select
on the results array.
- The idea behind this custom query is as follows:
- We first build a query on package names matching the search term. This query is paginated. (query A)
- Using (query A) as a subquery we partition the rows using the package name and we limit to the 10 most recent rows per package name. (query B)
- Finally, we select the proper columns from (query B) so that the query return rows that ActiveRecord can instantiate into
Packages::Package
. Note that we could also avoid this step and useselect_all
that would return an array of hashes without instantiating the rows in model instances.
- With the limitations put in place, we have at most 300 rows returned (30 package names * 10 versions each)
Database analysis
Since this feature is behind a feature flag, it's impossible to find a project on GitLab.com with a big list of nuget packages. Having said that, we used a script to generate ~300 nuget packages on https://gitlab.com/issue-reproduce/packages/nuget-package/-/packages. The script used 30 random package names and generated 10 random versions for each package.
The custom query uses 2 parameters that are used in (query A):
-
search term
. The search term entered by the user. Note that if this is blank, the API should work on the whole list of the available nuget packages. -
prerelease
. Boolean flag indicate if we include or exclude the pre release versions. Pre release versions are defined by having-
in the version string. See https://docs.microsoft.com/en-us/nuget/concepts/package-versioning#pre-release-versions.
We have 4 queries to analyze:
- search term present or blank
- prerelease true or false
Query search term blank / prerelease version true
SQL query:
SELECT
partition_subquery. "id",
partition_subquery. "project_id",
partition_subquery. "created_at",
partition_subquery. "updated_at",
partition_subquery. "name",
partition_subquery. "version",
partition_subquery. "package_type"
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND "packages_packages"."name" IN ( SELECT DISTINCT
"packages_packages"."name"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
ORDER BY
name ASC
LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)
Explain plan: https://gitlab.com/snippets/1940494
Query search term blank / prerelease version false
SQL query:
SELECT
partition_subquery. "id",
partition_subquery. "project_id",
partition_subquery. "created_at",
partition_subquery. "updated_at",
partition_subquery. "name",
partition_subquery. "version",
partition_subquery. "package_type"
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND "packages_packages"."name" IN ( SELECT DISTINCT
"packages_packages"."name"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND NOT ("packages_packages"."version" ILIKE '%-%')
ORDER BY
name ASC
LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)
Explain plan: https://gitlab.com/snippets/1940496
Query search term "pack" / prerelease version true
SQL query:
SELECT
partition_subquery. "id",
partition_subquery. "project_id",
partition_subquery. "created_at",
partition_subquery. "updated_at",
partition_subquery. "name",
partition_subquery. "version",
partition_subquery. "package_type"
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND "packages_packages"."name" IN ( SELECT DISTINCT
"packages_packages"."name"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND "packages_packages"."name" ILIKE '%pack%'
ORDER BY
name ASC
LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)
Explain plan: https://gitlab.com/snippets/1940497
Query search term "pack" / prerelease version false
SQL query:
SELECT
partition_subquery. "id",
partition_subquery. "project_id",
partition_subquery. "created_at",
partition_subquery. "updated_at",
partition_subquery. "name",
partition_subquery. "version",
partition_subquery. "package_type"
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND "packages_packages"."name" IN ( SELECT DISTINCT
"packages_packages"."name"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" = 15835259
AND "packages_packages"."package_type" = 4
AND "packages_packages"."version" IS NOT NULL
AND "packages_packages"."name" != 'NuGet.Temporary.Package'
AND NOT ("packages_packages"."version" ILIKE '%-%')
AND "packages_packages"."name" ILIKE '%pack%'
ORDER BY
name ASC
LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)
Explain plan: https://gitlab.com/snippets/1940498
Screenshots
Here is an example of the json returned by the API:
{
"totalHits": 10,
"data": [
{
"@type": "Package",
"authors": "",
"id": "NugetPackage1",
"title": "NugetPackage1",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.1",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage1/1.0.1.json",
"version": "1.0.1",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage10",
"title": "NugetPackage10",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.10",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage10/1.0.10.json",
"version": "1.0.10",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage2",
"title": "NugetPackage2",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.2",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage2/1.0.2.json",
"version": "1.0.2",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage3",
"title": "NugetPackage3",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.3",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage3/1.0.3.json",
"version": "1.0.3",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage4",
"title": "NugetPackage4",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.4",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage4/1.0.4.json",
"version": "1.0.4",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage5",
"title": "NugetPackage5",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.5",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage5/1.0.5.json",
"version": "1.0.5",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage6",
"title": "NugetPackage6",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.6",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage6/1.0.6.json",
"version": "1.0.6",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage7",
"title": "NugetPackage7",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.7",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage7/1.0.7.json",
"version": "1.0.7",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage8",
"title": "NugetPackage8",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.8",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage8/1.0.8.json",
"version": "1.0.8",
"downloads": 0
}
]
},
{
"@type": "Package",
"authors": "",
"id": "NugetPackage9",
"title": "NugetPackage9",
"summary": "",
"totalDownloads": 0,
"verified": true,
"version": "1.0.9",
"versions": [
{
"@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage9/1.0.9.json",
"version": "1.0.9",
"downloads": 0
}
]
}
]
}
Here are some screenshots of the search UI in visual studio working with the API endpoints of this MR.
Note that the metadata of the package (the right side panel) is almost empty. That is expected since, for now, nuget packages processed by GitLab have only their package name and version read.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team