Skip to content

Use `Namespace#all_projects` for NPM package finder

Stan Hu requested to merge sh-fix-npm-package-finder-all-projects into master

In https://gitlab.com/gitlab-com/gl-infra/production/-/issues/3894, we found that query plans using self_and_descendants as a subquery in an IN () clause may trigger a PostgreSQL v11 query planner bug, resulting in a significant performance degradation.

!56078 (merged) works around this issue by moving the clause into a INNER JOIN, but finders may have to take advantage of this by calling Group#all_projects instead of self_and_descendants.

This is a similar fix to !56346 (merged).

Relates to #324220 (closed)

Even though the query plans don't show much of a change, the INNER JOIN should ensure a nested join is used instead of a hashed join, which iterates over millions of rows.

Before

SELECT
  "packages_packages".*
FROM
  "packages_packages"
WHERE
  "packages_packages"."project_id" IN (
    SELECT
      "projects"."id"
    FROM
      "projects"
    WHERE
      "projects"."namespace_id" IN (
        WITH RECURSIVE "base_and_descendants" AS (
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."id" = 4249178
          )
          UNION
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces",
              "base_and_descendants"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."parent_id" = "base_and_descendants"."id"
          )
        )
        SELECT
          "id"
        FROM
          "base_and_descendants" AS "namespaces"
      )
  )
                                                                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2324.92..7992.55 rows=151843 width=84) (actual time=107.215..765.456 rows=304274 loops=1)
   ->  HashAggregate  (cost=2324.49..2359.50 rows=3501 width=4) (actual time=107.109..120.345 rows=28554 loops=1)
         Group Key: projects.id
         ->  Nested Loop  (cost=1582.13..2315.74 rows=3501 width=4) (actual time=48.231..95.613 rows=28554 loops=1)
               ->  HashAggregate  (cost=1581.69..1583.50 rows=181 width=4) (actual time=48.205..49.811 rows=4857 loops=1)
                     Group Key: namespaces.id
                     ->  CTE Scan on base_and_descendants namespaces  (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.029..46.235 rows=4857 loops=1)
                           CTE base_and_descendants
                             ->  Recursive Union  (cost=0.43..1575.81 rows=181 width=344) (actual time=0.027..40.122 rows=4857 loops=1)
                                   ->  Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=344) (actual time=0.018..0.019 rows=1 loops=1)
                                         Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
                                   ->  Nested Loop  (cost=0.56..156.87 rows=18 width=344) (actual time=0.038..3.670 rows=607 loops=8)
                                         ->  WorkTable Scan on base_and_descendants  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.114 rows=607 loops=8)
                                         ->  Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2  (cost=0.56..15.65 rows=2 width=344) (actual time=0.003..0.005 rows=1 loops=4857)
                                               Index Cond: (parent_id = base_and_descendants.id)
                                               Filter: ((type)::text = 'Group'::text)
               ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.009 rows=6 loops=4857)
                     Index Cond: (namespace_id = namespaces.id)
                     Heap Fetches: 5029
   ->  Index Scan using index_packages_packages_on_project_id_and_status on packages_packages  (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
         Index Cond: (project_id = projects.id)
 Planning Time: 1.724 ms
 Execution Time: 781.531 ms
(23 rows)

After (without !56078 (merged))

SELECT
  "packages_packages".*
FROM
  "packages_packages"
WHERE
  "packages_packages"."project_id" IN (
    SELECT
      "projects"."id"
    FROM
      "projects"
    WHERE
      "projects"."namespace_id" IN (
        WITH RECURSIVE "base_and_descendants" AS (
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."id" = 4249178
          )
          UNION
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces",
              "base_and_descendants"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."parent_id" = "base_and_descendants"."id"
          )
        )
        SELECT
          id
        FROM
          "base_and_descendants" AS "namespaces"
      )
  )
                                                                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2324.92..7992.55 rows=151843 width=84) (actual time=127.359..780.839 rows=304274 loops=1)
   ->  HashAggregate  (cost=2324.49..2359.50 rows=3501 width=4) (actual time=127.262..137.346 rows=28554 loops=1)
         Group Key: projects.id
         ->  Nested Loop  (cost=1582.13..2315.74 rows=3501 width=4) (actual time=65.504..114.611 rows=28554 loops=1)
               ->  HashAggregate  (cost=1581.69..1583.50 rows=181 width=4) (actual time=65.473..67.080 rows=4857 loops=1)
                     Group Key: namespaces.id
                     ->  CTE Scan on base_and_descendants namespaces  (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.039..62.636 rows=4857 loops=1)
                           CTE base_and_descendants
                             ->  Recursive Union  (cost=0.43..1575.81 rows=181 width=344) (actual time=0.036..53.686 rows=4857 loops=1)
                                   ->  Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=344) (actual time=0.026..0.026 rows=1 loops=1)
                                         Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
                                   ->  Nested Loop  (cost=0.56..156.87 rows=18 width=344) (actual time=0.050..4.793 rows=607 loops=8)
                                         ->  WorkTable Scan on base_and_descendants  (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.129 rows=607 loops=8)
                                         ->  Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2  (cost=0.56..15.65 rows=2 width=344) (actual time=0.005..0.007 rows=1 loops=4857)
                                               Index Cond: (parent_id = base_and_descendants.id)
                                               Filter: ((type)::text = 'Group'::text)
               ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.009 rows=6 loops=4857)
                     Index Cond: (namespace_id = namespaces.id)
                     Heap Fetches: 5053
   ->  Index Scan using index_packages_packages_on_project_id_and_status on packages_packages  (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
         Index Cond: (project_id = projects.id)
 Planning Time: 2.790 ms
 Execution Time: 796.068 ms
(23 rows)

After (with !56078 (merged))

SELECT
  "packages_packages".*
FROM
  "packages_packages"
WHERE
  "packages_packages"."project_id" IN (
    SELECT
      "projects"."id"
    FROM
      "projects"
      INNER JOIN (
        WITH RECURSIVE "base_and_descendants" AS (
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."id" = 4249178
          )
          UNION
          (
            SELECT
              "namespaces".*
            FROM
              "namespaces",
              "base_and_descendants"
            WHERE
              "namespaces"."type" = 'Group'
              AND "namespaces"."parent_id" = "base_and_descendants"."id"
          )
        )
        SELECT
          "id"
        FROM
          "base_and_descendants" AS "namespaces"
      ) namespaces ON namespaces.id = projects.namespace_id
  )

Query plan:

                                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2322.66..7990.29 rows=151843 width=84) (actual time=122.447..791.721 rows=304276 loops=1)
   ->  HashAggregate  (cost=2322.23..2357.24 rows=3501 width=4) (actual time=122.346..133.759 rows=28554 loops=1)
         Group Key: projects.id
         ->  Nested Loop  (cost=1576.25..2313.48 rows=3501 width=4) (actual time=0.065..107.634 rows=28554 loops=1)
               ->  CTE Scan on base_and_descendants namespaces  (cost=1575.81..1579.43 rows=181 width=4) (actual time=0.032..55.440 rows=4857 loops=1)
                     CTE base_and_descendants
                       ->  Recursive Union  (cost=0.43..1575.81 rows=181 width=344) (actual time=0.029..47.679 rows=4857 loops=1)
                             ->  Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=344) (actual time=0.021..0.022 rows=1 loops=1)
                                   Index Cond: (((type)::text = 'Group'::text) AND (id = 4249178))
                             ->  Nested Loop  (cost=0.56..156.87 rows=18 width=344) (actual time=0.042..4.369 rows=607 loops=8)
                                   ->  WorkTable Scan on base_and_descendants  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.162 rows=607 loops=8)
                                   ->  Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2  (cost=0.56..15.65 rows=2 width=344) (actual time=0.004..0.006 rows=1 loops=4857)
                                         Index Cond: (parent_id = base_and_descendants.id)
                                         Filter: ((type)::text = 'Group'::text)
               ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..3.86 rows=19 width=8) (actual time=0.005..0.010 rows=6 loops=4857)
                     Index Cond: (namespace_id = namespaces.id)
                     Heap Fetches: 5111
   ->  Index Scan using index_packages_packages_on_project_id_and_status on packages_packages  (cost=0.42..1.18 rows=43 width=84) (actual time=0.004..0.021 rows=11 loops=28554)
         Index Cond: (project_id = projects.id)
 Planning Time: 1.636 ms
 Execution Time: 806.014 ms
(21 rows)
Edited by Stan Hu

Merge request reports

Loading