Break repository excess calculation into smaller, simpler queries [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
The repository execess calculation's query involves a subquery of getting all nested groups. This resulted in some slow query during one incident (but it is not the cause).
We want to separate the split nested groups query as a separate query, which should be more resilient when things go wrong.
The more detailed analysis can be found here: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12769#note_525534422
From research, one of the larger group contains 5000 sub-groups. A feature flag is added so we can test if this is okay, or if we need to further split the query.
Related issue #323985 (closed)
Database query
Query example: group.total_repository_size_excess
before change
SELECT
SUM(
(
(
"project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
) - "projects"."repository_size_limit"
)
)
FROM
"projects"
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE
"projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 22
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
id
FROM
"base_and_descendants" AS "namespaces"
)
AND (
"project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
) > "projects"."repository_size_limit"
AND "projects"."repository_size_limit" != 0
after change
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 22
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
id
FROM
"base_and_descendants" AS "namespaces"
SELECT
SUM(
(
(
"project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
) - "projects"."repository_size_limit"
)
)
FROM
"projects"
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE
"projects"."namespace_id" = 22
AND (
"project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
) > "projects"."repository_size_limit"
AND "projects"."repository_size_limit" != 0
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because the change does not affect functionality.
-
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team