Skip to content

Create a service to generate npm packages metadata on the group and instance level

Context

Performance can be slow when installing an npm package at the namespace level, when there are many projects and packages under the namespace. The DB query can be complex because it has to pull in data from multiple tables: namespaces, project_authorizations, and of course packages. The slowness happens here

Solution

One solution suggested in this thread is to move the expensive filters into the inner query and do name filtering only in the outer query. This MR implements that.

This MR also changes how projects inside a namespace are retrieved (🧵). The new implementation is limited to the finder class introduced in this MR. For this MR, the helper methods are reimplemented in the finder - see note. To limit the blast radius of the MR, the changes are limited to the NPM metadata endpoint only. Changes to the helper method - which will apply the performance improvement to the other endpoints and other package formats - will be deferred to a follow-up issue: #505645

Implementation Details

  1. Introduce Packages::Npm::GenerateMetadataForNamespaceService inheriting from GenerateMetadataService
  2. Create Packages::Npm::PackagesForBatchFinder using the by_any_overlap_with_traversal_ids scope for efficient project finding. Some methods in app/finders/concerns/packages/finder_helper.rb are overridden to implement the project finding changes.

Query Analysis

Summary

To make it easier to compare the changes between the master branch and this MR, I labeled the relevant Ruby lines as (1), (2) and (3):

app/services/packages/npm/generate_metadata_service.rb

(1)        packages.each_batch do |batch|
(2)          relation = preload_needed_relations(filter_packages(batch), only_dist_tags)

(3)          relation.each do |package|
               build_tags(package)
               store_latest_version(package.version)
               next if only_dist_tags

               build_versions(package)
             end
           end

As discussed in this thread, the key idea is to filter packages by name first, before filtering for the packages visible to user.

Original queries on master branch

app/services/packages/npm/generate_metadata_service.rb

(1)
packages.each_batch do |batch|
Query Analysis
SELECT
  "packages_packages"."id"
FROM
  "packages_packages"
WHERE
  "packages_packages"."project_id" IN (
    SELECT
      "projects"."id"
    FROM
      "projects"
      LEFT JOIN project_features ON projects.id = project_features.project_id
    WHERE
      "projects"."namespace_id" IN (
        SELECT
          namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
        FROM
          "namespaces"
        WHERE
          "namespaces"."type" = 'Group'
          AND (traversal_ids @ > ('{<redacted>}'))
      )
      AND (
        EXISTS (
          SELECT
            1
          FROM
            "project_authorizations"
          WHERE
            "project_authorizations"."user_id" = redacted
            AND (project_authorizations.project_id = projects.id)
        )
        OR projects.visibility_level IN (0, 10, 20)
      )
      AND (
        "project_features"."package_registry_access_level" > 0
        OR "project_features"."package_registry_access_level" IS NULL
      )
  )
  AND "packages_packages"."status" IN (0, 1)
  AND "packages_packages"."package_type" = 2
  AND "packages_packages"."name" = '<redacted>'
ORDER BY
  "packages_packages"."id" ASC
LIMIT
  1 OFFSET 1000

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33485/commands/103148

(2)
relation = preload_needed_relations(filter_packages(batch), only_dist_tags)
Query Analysis
SELECT
  "packages_dependencies"."id"
FROM
  "packages_dependencies"
WHERE
  "packages_dependencies"."id" IN (
    SELECT
      "packages_dependency_links"."dependency_id"
    FROM
      "packages_dependency_links"
    WHERE
      "packages_dependency_links"."package_id" IN (
        SELECT
          "packages_packages"."id"
        FROM
          "packages_packages"
        WHERE
          "packages_packages"."project_id" IN (
            SELECT
              "projects"."id"
            FROM
              "projects"
              LEFT JOIN project_features ON projects.id = project_features.project_id
            WHERE
              "projects"."namespace_id" IN (
                SELECT
                  namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
                FROM
                  "namespaces"
                WHERE
                  "namespaces"."type" = 'Group'
                  AND (traversal_ids @ > ('{redacted}'))
              )
              AND (
                EXISTS (
                  SELECT
                    1
                  FROM
                    "project_authorizations"
                  WHERE
                    "project_authorizations"."user_id" = 1
                    AND (project_authorizations.project_id = projects.id)
                )
                OR projects.visibility_level IN (0, 10, 20)
              )
              AND (
                "project_features"."package_registry_access_level" > 0
                OR "project_features"."package_registry_access_level" IS NULL
              )
          )
          AND "packages_packages"."status" IN (0, 1)
          AND "packages_packages"."package_type" = 2
          AND "packages_packages"."name" = 'redacted'
          AND "packages_packages"."id" >= redacted
      )
  )
  AND 1 = 1
ORDER BY
  "packages_dependencies"."id" ASC
LIMIT
  1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33727/commands/103711

Query Analysis
WITH "dependency_links_cte" AS MATERIALIZED (
  SELECT
    package_id,
    dependency_type,
    ARRAY_AGG(dependency_id) as dependency_ids
  FROM
    "packages_dependency_links"
  WHERE
    "packages_dependency_links"."package_id" IN (
      SELECT
        "packages_packages"."id"
      FROM
        "packages_packages"
      WHERE
        "packages_packages"."project_id" IN (
          SELECT
            "projects"."id"
          FROM
            "projects"
            LEFT JOIN project_features ON projects.id = project_features.project_id
          WHERE
            "projects"."namespace_id" IN (
              SELECT
                namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
              FROM
                "namespaces"
              WHERE
                "namespaces"."type" = 'Group'
                AND (traversal_ids @ > ('{redacted}'))
            )
            AND (
              EXISTS (
                SELECT
                  1
                FROM
                  "project_authorizations"
                WHERE
                  "project_authorizations"."user_id" = redacted
                  AND (project_authorizations.project_id = projects.id)
              )
              OR projects.visibility_level IN (0, 10, 20)
            )
            AND (
              "project_features"."package_registry_access_level" > 0
              OR "project_features"."package_registry_access_level" IS NULL
            )
        )
        AND "packages_packages"."status" IN (0, 1)
        AND "packages_packages"."package_type" = 2
        AND "packages_packages"."name" = 'redacted'
        AND "packages_packages"."id" >= redacted
    )
  GROUP BY
    "packages_dependency_links"."package_id",
    "packages_dependency_links"."dependency_type"
)
SELECT
  "package_id"
FROM
  "dependency_links_cte" "packages_dependency_links"
GROUP BY
  "package_id"
ORDER BY
  "package_id" ASC
LIMIT
  1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33727/commands/103713

(3)

relation.each do |package|
Query Analysis
SELECT
    "packages_packages".*
FROM
    "packages_packages"
WHERE
    "packages_packages"."project_id" IN (
        SELECT
            "projects"."id"
        FROM
            "projects"
            LEFT JOIN project_features ON projects.id = project_features.project_id
        WHERE
            "projects"."namespace_id" IN (
                SELECT
                    namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
                FROM
                    "namespaces"
                WHERE
                    "namespaces"."type" = 'Group'
                    AND (traversal_ids @ > ('{redacted}'))
            )
            AND (
                EXISTS (
                    SELECT
                        1
                    FROM
                        "project_authorizations"
                    WHERE
                        "project_authorizations"."user_id" = redacted
                        AND (project_authorizations.project_id = projects.id)
                )
                OR projects.visibility_level IN (0, 10, 20)
            )
            AND (
                "project_features"."package_registry_access_level" > 0
                OR "project_features"."package_registry_access_level" IS NULL
            )
    )
    AND "packages_packages"."status" IN (0, 1)
    AND "packages_packages"."package_type" = 2
    AND "packages_packages"."id" IN (redacted)

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104599

Queries on MR branch

app/services/packages/npm/generate_metadata_service.rb

(1)
packages.each_batch do |batch|
Query Analysis
SELECT
    "packages_packages"."id"
FROM
    "packages_packages"
WHERE
    "packages_packages"."package_type" = 2
    AND "packages_packages"."status" IN (0, 1)
    AND "packages_packages"."name" = 'redacted'
    AND "packages_packages"."id" >= redacted
ORDER BY
    "packages_packages"."id" ASC
LIMIT
    1 OFFSET 1000

WITH current package_name_index index

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104577

AFTER running

EXEC DROP INDEX package_name_index;
EXEC CREATE INDEX package_name_id_index ON packages_packages(name, id);

WITH package_name_id_index index https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104581

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33784/commands/103952

(2)
relation = preload_needed_relations(filter_packages(batch), only_dist_tags)
Query 1
SELECT
    "packages_packages"."id"
FROM
    "packages_packages"
WHERE
    "packages_packages"."package_type" = 2
    AND "packages_packages"."status" IN (0, 1)
    AND "packages_packages"."name" = 'redacted'
    AND "packages_packages"."id" >= redacted

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104917

Query 2
SELECT
    "packages_dependencies"."id"
FROM
    "packages_dependencies"
WHERE
    "packages_dependencies"."id" IN (
        SELECT
            "packages_dependency_links"."dependency_id"
        FROM
            "packages_dependency_links"
        WHERE
            "packages_dependency_links"."package_id" IN (
                SELECT
                    "packages_packages"."id"
                FROM
                    "packages_packages"
                WHERE
                    "packages_packages"."project_id" IN (
                        SELECT
                            "projects"."id"
                        FROM
                            "projects"
                            INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
                            LEFT JOIN project_features ON projects.id = project_features.project_id
                        WHERE
                            (
                                namespaces.traversal_ids::bigint[] && ARRAY[redacted]::bigint[]
                            )
                            AND (
                                EXISTS (
                                    SELECT
                                        1
                                    FROM
                                        "project_authorizations"
                                    WHERE
                                        "project_authorizations"."user_id" = redacted
                                        AND (project_authorizations.project_id = projects.id)
                                )
                                OR projects.visibility_level IN (0, 10, 20)
                            )
                            AND (
                                "project_features"."package_registry_access_level" > 0
                                OR "project_features"."package_registry_access_level" IS NULL
                            )
                    )
                    AND "packages_packages"."id" IN (redacted)
            )
    )
    AND 1 = 1
ORDER BY
    "packages_dependencies"."id" ASC
LIMIT
    1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104863

Query 3
WITH "dependency_links_cte" AS MATERIALIZED (
    SELECT
        package_id,
        dependency_type,
        ARRAY_AGG(dependency_id) as dependency_ids
    FROM
        "packages_dependency_links"
    WHERE
        "packages_dependency_links"."package_id" IN (
            SELECT
                "packages_packages"."id"
            FROM
                "packages_packages"
            WHERE
                "packages_packages"."project_id" IN (
                    SELECT
                        "projects"."id"
                    FROM
                        "projects"
                        INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
                        LEFT JOIN project_features ON projects.id = project_features.project_id
                    WHERE
                        (
                            namespaces.traversal_ids ::bigint[] && ARRAY [redacted]::bigint[]
                        )
                        AND (
                            EXISTS (
                                SELECT
                                    1
                                FROM
                                    "project_authorizations"
                                WHERE
                                    "project_authorizations"."user_id" = redacted
                                    AND (project_authorizations.project_id = projects.id)
                            )
                            OR projects.visibility_level IN (0, 10, 20)
                        )
                        AND (
                            "project_features"."package_registry_access_level" > 0
                            OR "project_features"."package_registry_access_level" IS NULL
                        )
                )
                AND "packages_packages"."id" IN (redacted)
        )
    GROUP BY
        "packages_dependency_links"."package_id",
        "packages_dependency_links"."dependency_type"
)
SELECT
    "package_id"
FROM
    "dependency_links_cte" "packages_dependency_links"
GROUP BY
    "package_id"
ORDER BY
    "package_id" ASC
LIMIT
    1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104865

(3)

relation.each do |package|
Query Analysis
SELECT
    "packages_packages".*
FROM
    "packages_packages"
WHERE
    "packages_packages"."project_id" IN (
        SELECT
            "projects"."id"
        FROM
            "projects"
            INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
            LEFT JOIN project_features ON projects.id = project_features.project_id
        WHERE
            (
                namespaces.traversal_ids::bigint[] && ARRAY [redacted]::bigint[]
            )
            AND (
                EXISTS (
                    SELECT
                        1
                    FROM
                        "project_authorizations"
                    WHERE
                        "project_authorizations"."user_id" = redacted
                        AND (project_authorizations.project_id = projects.id)
                )
                OR projects.visibility_level IN (0, 10, 20)
            )
            AND (
                "project_features"."package_registry_access_level" > 0
                OR "project_features"."package_registry_access_level" IS NULL
            )
    )
    AND "packages_packages"."id" IN (redacted)

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104861

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

NA. No UI changes 🌈

How to set up and validate locally

Installing a package should work just like before, for instance, group and project levels.

Related to #463822

Edited by Radamanthus Batnag

Merge request reports

Loading