Create a service to generate npm packages metadata on the group and instance level
Context
Performance can be slow when installing an npm package at the namespace level, when there are many projects and packages under the namespace. The DB query can be complex because it has to pull in data from multiple tables: namespaces, project_authorizations, and of course packages. The slowness happens here
Solution
One solution suggested in this thread is to move the expensive filters into the inner query and do name filtering only in the outer query. This MR implements that.
This MR also changes how projects inside a namespace are retrieved (
Implementation Details
- Introduce
Packages::Npm::GenerateMetadataForNamespaceService
inheriting fromGenerateMetadataService
- Create
Packages::Npm::PackagesForBatchFinder
using theby_any_overlap_with_traversal_ids
scope for efficient project finding. Some methods inapp/finders/concerns/packages/finder_helper.rb
are overridden to implement the project finding changes.
Query Analysis
Summary
To make it easier to compare the changes between the master
branch and this MR, I labeled the relevant Ruby lines as (1)
, (2)
and (3)
:
app/services/packages/npm/generate_metadata_service.rb
(1) packages.each_batch do |batch|
(2) relation = preload_needed_relations(filter_packages(batch), only_dist_tags)
(3) relation.each do |package|
build_tags(package)
store_latest_version(package.version)
next if only_dist_tags
build_versions(package)
end
end
As discussed in this thread, the key idea is to filter packages by name first, before filtering for the packages visible to user.
master
branch
Original queries on app/services/packages/npm/generate_metadata_service.rb
(1)
packages.each_batch do |batch|
Query Analysis
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @ > ('{<redacted>}'))
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."package_type" = 2
AND "packages_packages"."name" = '<redacted>'
ORDER BY
"packages_packages"."id" ASC
LIMIT
1 OFFSET 1000
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33485/commands/103148
(2)
relation = preload_needed_relations(filter_packages(batch), only_dist_tags)
Query Analysis
SELECT
"packages_dependencies"."id"
FROM
"packages_dependencies"
WHERE
"packages_dependencies"."id" IN (
SELECT
"packages_dependency_links"."dependency_id"
FROM
"packages_dependency_links"
WHERE
"packages_dependency_links"."package_id" IN (
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @ > ('{redacted}'))
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 1
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."package_type" = 2
AND "packages_packages"."name" = 'redacted'
AND "packages_packages"."id" >= redacted
)
)
AND 1 = 1
ORDER BY
"packages_dependencies"."id" ASC
LIMIT
1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33727/commands/103711
Query Analysis
WITH "dependency_links_cte" AS MATERIALIZED (
SELECT
package_id,
dependency_type,
ARRAY_AGG(dependency_id) as dependency_ids
FROM
"packages_dependency_links"
WHERE
"packages_dependency_links"."package_id" IN (
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @ > ('{redacted}'))
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."package_type" = 2
AND "packages_packages"."name" = 'redacted'
AND "packages_packages"."id" >= redacted
)
GROUP BY
"packages_dependency_links"."package_id",
"packages_dependency_links"."dependency_type"
)
SELECT
"package_id"
FROM
"dependency_links_cte" "packages_dependency_links"
GROUP BY
"package_id"
ORDER BY
"package_id" ASC
LIMIT
1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33727/commands/103713
(3)
relation.each do |package|
Query Analysis
SELECT
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @ > ('{redacted}'))
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."package_type" = 2
AND "packages_packages"."id" IN (redacted)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104599
Queries on MR branch
app/services/packages/npm/generate_metadata_service.rb
(1)
packages.each_batch do |batch|
Query Analysis
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."package_type" = 2
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."name" = 'redacted'
AND "packages_packages"."id" >= redacted
ORDER BY
"packages_packages"."id" ASC
LIMIT
1 OFFSET 1000
WITH current package_name_index
index
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104577
AFTER running
EXEC DROP INDEX package_name_index;
EXEC CREATE INDEX package_name_id_index ON packages_packages(name, id);
WITH package_name_id_index
index
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33997/commands/104581
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33784/commands/103952
(2)
relation = preload_needed_relations(filter_packages(batch), only_dist_tags)
Query 1
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."package_type" = 2
AND "packages_packages"."status" IN (0, 1)
AND "packages_packages"."name" = 'redacted'
AND "packages_packages"."id" >= redacted
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104917
Query 2
SELECT
"packages_dependencies"."id"
FROM
"packages_dependencies"
WHERE
"packages_dependencies"."id" IN (
SELECT
"packages_dependency_links"."dependency_id"
FROM
"packages_dependency_links"
WHERE
"packages_dependency_links"."package_id" IN (
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
(
namespaces.traversal_ids::bigint[] && ARRAY[redacted]::bigint[]
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."id" IN (redacted)
)
)
AND 1 = 1
ORDER BY
"packages_dependencies"."id" ASC
LIMIT
1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104863
Query 3
WITH "dependency_links_cte" AS MATERIALIZED (
SELECT
package_id,
dependency_type,
ARRAY_AGG(dependency_id) as dependency_ids
FROM
"packages_dependency_links"
WHERE
"packages_dependency_links"."package_id" IN (
SELECT
"packages_packages"."id"
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
(
namespaces.traversal_ids ::bigint[] && ARRAY [redacted]::bigint[]
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."id" IN (redacted)
)
GROUP BY
"packages_dependency_links"."package_id",
"packages_dependency_links"."dependency_type"
)
SELECT
"package_id"
FROM
"dependency_links_cte" "packages_dependency_links"
GROUP BY
"package_id"
ORDER BY
"package_id" ASC
LIMIT
1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104865
(3)
relation.each do |package|
Query Analysis
SELECT
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
(
namespaces.traversal_ids::bigint[] && ARRAY [redacted]::bigint[]
)
AND (
EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = redacted
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0, 10, 20)
)
AND (
"project_features"."package_registry_access_level" > 0
OR "project_features"."package_registry_access_level" IS NULL
)
)
AND "packages_packages"."id" IN (redacted)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34072/commands/104861
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
NA. No UI changes
How to set up and validate locally
Installing a package should work just like before, for instance, group and project levels.
Related to #463822