Use `Namespace#all_projects` for NPM package finder
In https://gitlab.com/gitlab-com/gl-infra/production/-/issues/3894, we
found that query plans using self_and_descendants
as a subquery in an
IN ()
clause may trigger a PostgreSQL v11 query planner bug, resulting
in a significant performance degradation.
!56078 (merged) works around
this issue by moving the clause into a INNER JOIN, but finders may have
to take advantage of this by calling Group#all_projects
instead of
self_and_descendants
.
This is a similar fix to !56346 (merged).
Relates to #324220 (closed)
Even though the query plans don't show much of a change, the INNER JOIN should ensure a nested join is used instead of a hashed join, which iterates over millions of rows.
Before
SELECT
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 4249178
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
"id"
FROM
"base_and_descendants" AS "namespaces"
)
)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2324.92..7992.55 rows=151843 width=84) (actual time=107.215..765.456 rows=304274 loops=1)
-> HashAggregate (cost=2324.49..2359.50 rows=3501 width=4) (actual time=107.109..120.345 rows=28554 loops=1)
Group Key: projects.id
-> Nested Loop (cost=1582.13..2315.74 rows=3501 width=4) (actual time=48.231..95.613 rows=28554 loops=1)
-> HashAggregate (cost=1581.69..1583.50 rows=181 width=4) (actual time=48.205..49.811 rows=4857 loops=1)
Group Key: namespaces.id
-> CTE Scan on base_and_descendants namespaces (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.029..46.235 rows=4857 loops=1)
CTE base_and_descendants
-> Recursive Union (cost=0.43..1575.81 rows=181 width=344) (actual time=0.027..40.122 rows=4857 loops=1)
-> Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1 (cost=0.43..3.45 rows=1 width=344) (actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
-> Nested Loop (cost=0.56..156.87 rows=18 width=344) (actual time=0.038..3.670 rows=607 loops=8)
-> WorkTable Scan on base_and_descendants (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.114 rows=607 loops=8)
-> Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2 (cost=0.56..15.65 rows=2 width=344) (actual time=0.003..0.005 rows=1 loops=4857)
Index Cond: (parent_id = base_and_descendants.id)
Filter: ((type)::text = 'Group'::text)
-> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.009 rows=6 loops=4857)
Index Cond: (namespace_id = namespaces.id)
Heap Fetches: 5029
-> Index Scan using index_packages_packages_on_project_id_and_status on packages_packages (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
Index Cond: (project_id = projects.id)
Planning Time: 1.724 ms
Execution Time: 781.531 ms
(23 rows)
!56078 (merged))
After (withoutSELECT
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 4249178
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
id
FROM
"base_and_descendants" AS "namespaces"
)
)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2324.92..7992.55 rows=151843 width=84) (actual time=127.359..780.839 rows=304274 loops=1)
-> HashAggregate (cost=2324.49..2359.50 rows=3501 width=4) (actual time=127.262..137.346 rows=28554 loops=1)
Group Key: projects.id
-> Nested Loop (cost=1582.13..2315.74 rows=3501 width=4) (actual time=65.504..114.611 rows=28554 loops=1)
-> HashAggregate (cost=1581.69..1583.50 rows=181 width=4) (actual time=65.473..67.080 rows=4857 loops=1)
Group Key: namespaces.id
-> CTE Scan on base_and_descendants namespaces (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.039..62.636 rows=4857 loops=1)
CTE base_and_descendants
-> Recursive Union (cost=0.43..1575.81 rows=181 width=344) (actual time=0.036..53.686 rows=4857 loops=1)
-> Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1 (cost=0.43..3.45 rows=1 width=344) (actual time=0.026..0.026 rows=1 loops=1)
Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
-> Nested Loop (cost=0.56..156.87 rows=18 width=344) (actual time=0.050..4.793 rows=607 loops=8)
-> WorkTable Scan on base_and_descendants (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.129 rows=607 loops=8)
-> Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2 (cost=0.56..15.65 rows=2 width=344) (actual time=0.005..0.007 rows=1 loops=4857)
Index Cond: (parent_id = base_and_descendants.id)
Filter: ((type)::text = 'Group'::text)
-> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.009 rows=6 loops=4857)
Index Cond: (namespace_id = namespaces.id)
Heap Fetches: 5053
-> Index Scan using index_packages_packages_on_project_id_and_status on packages_packages (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
Index Cond: (project_id = projects.id)
Planning Time: 2.790 ms
Execution Time: 796.068 ms
(23 rows)
!56078 (merged))
After (withSELECT
"packages_packages".*
FROM
"packages_packages"
WHERE
"packages_packages"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
INNER JOIN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."id" = 4249178
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
"id"
FROM
"base_and_descendants" AS "namespaces"
) namespaces ON namespaces.id = projects.namespace_id
)
Query plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2322.66..7990.29 rows=151843 width=84) (actual time=122.447..791.721 rows=304276 loops=1)
-> HashAggregate (cost=2322.23..2357.24 rows=3501 width=4) (actual time=122.346..133.759 rows=28554 loops=1)
Group Key: projects.id
-> Nested Loop (cost=1576.25..2313.48 rows=3501 width=4) (actual time=0.065..107.634 rows=28554 loops=1)
-> CTE Scan on base_and_descendants namespaces (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.032..55.440 rows=4857 loops=1)
CTE base_and_descendants
-> Recursive Union (cost=0.43..1575.81 rows=181 width=344) (actual time=0.029..47.679 rows=4857 loops=1)
-> Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1 (cost=0.43..3.45 rows=1 width=344) (actual time=0.021..0.022 rows=1 loops=1)
Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
-> Nested Loop (cost=0.56..156.87 rows=18 width=344) (actual time=0.042..4.369 rows=607 loops=8)
-> WorkTable Scan on base_and_descendants (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.162 rows=607 loops=8)
-> Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2 (cost=0.56..15.65 rows=2 width=344) (actual time=0.004..0.006 rows=1 loops=4857)
Index Cond: (parent_id = base_and_descendants.id)
Filter: ((type)::text = 'Group'::text)
-> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.010 rows=6 loops=4857)
Index Cond: (namespace_id = namespaces.id)
Heap Fetches: 5111
-> Index Scan using index_packages_packages_on_project_id_and_status on packages_packages (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
Index Cond: (project_id = projects.id)
Planning Time: 1.636 ms
Execution Time: 806.014 ms
(21 rows)
Edited by Stan Hu