Skip to content

NuGet - The Search Service

David Fernandez requested to merge 36505-nuget-the-search-service into master

What does this MR do?

Implements the nuget search service API. See https://docs.microsoft.com/en-us/nuget/api/search-query-service-resource.

This API is meant to be used by Visual Studio.

See #36505 (closed)

This MR is part of the nuget MVC (See the epic &2271 (closed)). This MR is gated behind this feature flag (scoped by project): nuget_package_registry. This MR doesn't need documentation or a change log entry.

Technical notes

  • The search API(used by Visual Studio) doesn't match how GitLab represent packages.
  • The search API expects to return a json structure as an array:
    • Each element is a package with its name and several other attributes
    • For each element, there is a versions array which holds all the available versions.
  • This structure is paginated, meaning that clients will send some parameters indicating which page the API should return.
    • Note that the pagination is on the package names, not the versions.
  • On GitLab, each package name + version is a Packages::Package model instance.
    • In short, we have one row/element per package name + version combination.
  • For these reasons, the search query is implemented as custom SQL query.
    • We don't want to run a query for package names and for each result, run a query to get the available versions. This would lead to 1 + n queries where n is the page size.
    • This MR uses the approach where a single query fetches everything (name + versions) in one single query. This is possible with SQL using PARTITION.
    • To avoid ~performance issues, this query has hard limits (see below).
    • At the presenter level, we will need to restructure this result into a proper structure expected by the API entities. This means that the presenter will need to execute several .select on the results array.
  • The idea behind this custom query is as follows:
    • We first build a query on package names matching the search term. This query is paginated. (query A)
    • Using (query A) as a subquery we partition the rows using the package name and we limit to the 10 most recent rows per package name. (query B)
    • Finally, we select the proper columns from (query B) so that the query return rows that ActiveRecord can instantiate into Packages::Package. Note that we could also avoid this step and use select_all that would return an array of hashes without instantiating the rows in model instances.
  • With the limitations put in place, we have at most 300 rows returned (30 package names * 10 versions each)

Database analysis

Since this feature is behind a feature flag, it's impossible to find a project on GitLab.com with a big list of nuget packages. Having said that, we used a script to generate ~300 nuget packages on https://gitlab.com/issue-reproduce/packages/nuget-package/-/packages. The script used 30 random package names and generated 10 random versions for each package.

The custom query uses 2 parameters that are used in (query A):

We have 4 queries to analyze:

  • search term present or blank
  • prerelease true or false

Query search term blank / prerelease version true

SQL query:

SELECT
    partition_subquery. "id",
    partition_subquery. "project_id",
    partition_subquery. "created_at",
    partition_subquery. "updated_at",
    partition_subquery. "name",
    partition_subquery. "version",
    partition_subquery. "package_type"
FROM (
    SELECT
        ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
        "packages_packages".*
    FROM
        "packages_packages"
    WHERE
        "packages_packages"."project_id" = 15835259
        AND "packages_packages"."package_type" = 4
        AND "packages_packages"."version" IS NOT NULL
        AND "packages_packages"."name" != 'NuGet.Temporary.Package'
        AND "packages_packages"."name" IN ( SELECT DISTINCT
                "packages_packages"."name"
            FROM
                "packages_packages"
            WHERE
                "packages_packages"."project_id" = 15835259
                AND "packages_packages"."package_type" = 4
                AND "packages_packages"."version" IS NOT NULL
                AND "packages_packages"."name" != 'NuGet.Temporary.Package'
            ORDER BY
                name ASC
            LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)

Explain plan: https://gitlab.com/snippets/1940494

Query search term blank / prerelease version false

SQL query:

SELECT
    partition_subquery. "id",
    partition_subquery. "project_id",
    partition_subquery. "created_at",
    partition_subquery. "updated_at",
    partition_subquery. "name",
    partition_subquery. "version",
    partition_subquery. "package_type"
FROM (
    SELECT
        ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
        "packages_packages".*
    FROM
        "packages_packages"
    WHERE
        "packages_packages"."project_id" = 15835259
        AND "packages_packages"."package_type" = 4
        AND "packages_packages"."version" IS NOT NULL
        AND "packages_packages"."name" != 'NuGet.Temporary.Package'
        AND "packages_packages"."name" IN ( SELECT DISTINCT
                "packages_packages"."name"
            FROM
                "packages_packages"
            WHERE
                "packages_packages"."project_id" = 15835259
                AND "packages_packages"."package_type" = 4
                AND "packages_packages"."version" IS NOT NULL
                AND "packages_packages"."name" != 'NuGet.Temporary.Package'
                AND NOT ("packages_packages"."version" ILIKE '%-%')
            ORDER BY
                name ASC
            LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)

Explain plan: https://gitlab.com/snippets/1940496

Query search term "pack" / prerelease version true

SQL query:

SELECT
    partition_subquery. "id",
    partition_subquery. "project_id",
    partition_subquery. "created_at",
    partition_subquery. "updated_at",
    partition_subquery. "name",
    partition_subquery. "version",
    partition_subquery. "package_type"
FROM (
    SELECT
        ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
        "packages_packages".*
    FROM
        "packages_packages"
    WHERE
        "packages_packages"."project_id" = 15835259
        AND "packages_packages"."package_type" = 4
        AND "packages_packages"."version" IS NOT NULL
        AND "packages_packages"."name" != 'NuGet.Temporary.Package'
        AND "packages_packages"."name" IN ( SELECT DISTINCT
                "packages_packages"."name"
            FROM
                "packages_packages"
            WHERE
                "packages_packages"."project_id" = 15835259
                AND "packages_packages"."package_type" = 4
                AND "packages_packages"."version" IS NOT NULL
                AND "packages_packages"."name" != 'NuGet.Temporary.Package'
                AND "packages_packages"."name" ILIKE '%pack%'
            ORDER BY
                name ASC
            LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)

Explain plan: https://gitlab.com/snippets/1940497

Query search term "pack" / prerelease version false

SQL query:

SELECT
    partition_subquery. "id",
    partition_subquery. "project_id",
    partition_subquery. "created_at",
    partition_subquery. "updated_at",
    partition_subquery. "name",
    partition_subquery. "version",
    partition_subquery. "package_type"
FROM (
    SELECT
        ROW_NUMBER() OVER (PARTITION BY "packages_packages"."name" ORDER BY "packages_packages"."created_at" DESC) AS row_number,
        "packages_packages".*
    FROM
        "packages_packages"
    WHERE
        "packages_packages"."project_id" = 15835259
        AND "packages_packages"."package_type" = 4
        AND "packages_packages"."version" IS NOT NULL
        AND "packages_packages"."name" != 'NuGet.Temporary.Package'
        AND "packages_packages"."name" IN ( SELECT DISTINCT
                "packages_packages"."name"
            FROM
                "packages_packages"
            WHERE
                "packages_packages"."project_id" = 15835259
                AND "packages_packages"."package_type" = 4
                AND "packages_packages"."version" IS NOT NULL
                AND "packages_packages"."name" != 'NuGet.Temporary.Package'
                AND NOT ("packages_packages"."version" ILIKE '%-%')
                AND "packages_packages"."name" ILIKE '%pack%'
            ORDER BY
                name ASC
            LIMIT 20 OFFSET 0)) partition_subquery
WHERE (partition_subquery.row_number <= 10)

Explain plan: https://gitlab.com/snippets/1940498

Screenshots

Here is an example of the json returned by the API:

{
  "totalHits": 10,
  "data": [
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage1",
      "title": "NugetPackage1",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.1",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage1/1.0.1.json",
          "version": "1.0.1",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage10",
      "title": "NugetPackage10",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.10",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage10/1.0.10.json",
          "version": "1.0.10",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage2",
      "title": "NugetPackage2",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.2",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage2/1.0.2.json",
          "version": "1.0.2",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage3",
      "title": "NugetPackage3",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.3",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage3/1.0.3.json",
          "version": "1.0.3",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage4",
      "title": "NugetPackage4",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.4",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage4/1.0.4.json",
          "version": "1.0.4",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage5",
      "title": "NugetPackage5",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.5",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage5/1.0.5.json",
          "version": "1.0.5",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage6",
      "title": "NugetPackage6",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.6",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage6/1.0.6.json",
          "version": "1.0.6",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage7",
      "title": "NugetPackage7",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.7",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage7/1.0.7.json",
          "version": "1.0.7",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage8",
      "title": "NugetPackage8",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.8",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage8/1.0.8.json",
          "version": "1.0.8",
          "downloads": 0
        }
      ]
    },
    {
      "@type": "Package",
      "authors": "",
      "id": "NugetPackage9",
      "title": "NugetPackage9",
      "summary": "",
      "totalDownloads": 0,
      "verified": true,
      "version": "1.0.9",
      "versions": [
        {
          "@id": "https://gitlab.local:3443/api/v4/projects/1/packages/nuget/metadata/NugetPackage9/1.0.9.json",
          "version": "1.0.9",
          "downloads": 0
        }
      ]
    }
  ]
}

Here are some screenshots of the search UI in visual studio working with the API endpoints of this MR.

Screenshot_2020-02-05_at_22.48.27

Screenshot_2020-02-05_at_22.48.35

Note that the metadata of the package (the right side panel) is almost empty. That is expected since, for now, nuget packages processed by GitLab have only their package name and version read.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Adam Hegyi

Merge request reports

Loading