Add hidden status to packages
What does this MR do?
Adds a new column status
to the packages_packages
table. The column is an enum that will hold various statuses that a given package could have. To start with, we are adding the :default
, :hidden
, and :processing
statuses.
Only :default
packages are returned in the API index endpoints and in the list pages of the UI (populated from these APIs) unless other statuses are specifically specified using the status
param.
The first package type to make use of this new column and :hidden
status is the Generic Package type. This MR allows the status
param to be used when creating Generic packages.
The processing
status is not yet used in this MR, but will be used in the implementation of RubyGems packages (currently in progress) when we process the incoming gem and extract metadata.
Database
Migrations
Up migrations
== 20210204152257 AddStatusToPackagesPackages: migrating ======================
-- add_column(:packages_packages, :status, :smallint, {:default=>0, :null=>false})
-> 0.0073s
== 20210204152257 AddStatusToPackagesPackages: migrated (0.0075s) =============
== 20210209171525 AddStatusIndexToPackagesPackages: migrating =================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:packages_packages, [:project_id, :status], {:name=>"index_packages_packages_on_project_id_and_status", :algorithm=>:concurrently})
-> 0.0103s
-- execute("SET statement_timeout TO 0")
-> 0.0098s
-- add_index(:packages_packages, [:project_id, :status], {:name=>"index_packages_packages_on_project_id_and_status", :algorithm=>:concurrently})
-> 0.1259s
-- execute("RESET ALL")
-> 0.0011s
== 20210209171525 AddStatusIndexToPackagesPackages: migrated (0.2294s) ========
Down migrations
== 20210209171525 AddStatusIndexToPackagesPackages: reverting =================
-- transaction_open?()
-> 0.0000s
-- indexes(:packages_packages)
-> 0.0056s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"index_packages_packages_on_project_id_and_status"})
-> 0.0028s
-- execute("RESET ALL")
-> 0.0005s
== 20210209171525 AddStatusIndexToPackagesPackages: reverted (0.0103s) ========
== 20210204152257 AddStatusToPackagesPackages: reverting ======================
-- remove_column(:packages_packages, :status, :smallint, {:default=>0, :null=>false})
-> 0.0066s
== 20210204152257 AddStatusToPackagesPackages: reverted (0.0209s) =============
Scope queries:
Both the packages_finder
and group_packages_finder
will always apply either the displayable
scope, or with_status
, and will always be looking at packages within a given project_id
or set of project_ids
. Most packages will have the default status: status = 0
, so adding an index where status = 0
does not really improve the query time since we will still have a large number of packages to scan or iterate over to apply any other filters. Adding a general index on [:project_id, :status]
does improve queries for non-default statuses. The finders paginate with a limit of 20, these example queries are the "base query" of the finder (no params included). This is the query that will run if you visit the package registry page for a project or group. Adding params to filter further, such as by package name, will result in using existing indexes such as (project_id, package_type)
and (project_id, name, version, package_type)
.
This thread also contains some details of how various other indexes were tried and how we ended up adding (project_id, status)
.
GroupPackagesFinder
This finder contains a pre-existing low performing query. See #321590 (closed), which has been opened to look into investigating and optimizing this query.
Here we look at a group that contains a project with a large number of packages (20000+) to get an idea of the timings on the higher end.
Base query:
SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" IN
(SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN
(WITH RECURSIVE "base_and_descendants" AS (
(SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."id" = 1)
UNION
(SELECT "namespaces".*
FROM "namespaces",
"base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
FROM "base_and_descendants" AS "namespaces") )
AND ("packages_packages"."package_type" != 4
OR "packages_packages"."name" != 'NuGet.Temporary.Package')
AND "packages_packages"."status" = 0
ORDER BY created_at ASC
LIMIT 20 OFFSET 0
Where status = 0 (default):
The index does not effect when status = 0
Explain: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2270/commands/7037
Where status = 1:
Explain before: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2261/commands/7024
Explain after: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2270/commands/7051
PackagesFinder
The PackagesFinder
works within the scope of a single project. Most of the time these queries are within the performance guidelines as most projects have less than 1000 packages, only a few have greater than 10000. To see the extreme case here and how an index may or may not help, we look at a project that has ~20000 packages with a few thousand updated to non-default statuses.
Base query:
SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" = 15288597
AND ("packages_packages"."package_type" != 4
OR "packages_packages"."name" != 'NuGet.Temporary.Package')
AND "packages_packages"."status" = 0 -- this is the aspect of the query that has changed
ORDER BY created_at ASC
LIMIT 20 OFFSET 0
With status = 0 (default)
The index does not effect when status = 0
Explain: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2270/commands/7041
With status = 1
Explain before: - https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2270/commands/7043
Explain after: - https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2270/commands/7055
Since we are sorting by created_at
by default, when we have a large enough number of packages of varying statuses, the planner will not use the new index, but will use the (project_id, created_at)
index instead. If we only have a few packages with a non-default package, the planner will make the choice to use the new index, resulting in a faster query:
Explain with smaller non-default packages and new index: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2304/commands/7150
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Related to #290741 (closed)