Skip to content

Optimize `standardRoles.membersCount` query

mo khan requested to merge mokhax/498688/slow-query-performance into master

What does this MR do and why?

The Roles and Permissions page is failing to load for large group hierarchies due to a poorly performing SQL query. The analysis of the poorly performing query can be found here.

This MR attempts to optimize this query to work efficiently for large group hierarchies with a lot of members. The group id used in the examples below have 5052258 members records.

I tried optimizing this query using a several different options like an OR query, UNION ALL query, Recursive CTE, Window Function, and I found that executing two separate queries (1 for group memberships + 1 for project memberships) and merging the results in memory seemed to provide the best combination of speed and low memory.

Before:

SELECT
  "members"."access_level",
  COUNT(*) AS members_count
FROM
  (
    (
      SELECT
        "members"."id",
        "members"."access_level",
        "members"."source_id",
        "members"."source_type",
        "members"."user_id",
        "members"."notification_level",
        "members"."type",
        "members"."created_at",
        "members"."updated_at",
        "members"."created_by_id",
        "members"."invite_email",
        "members"."invite_token",
        "members"."invite_accepted_at",
        "members"."requested_at",
        "members"."expires_at",
        "members"."ldap",
        "members"."override",
        "members"."invite_email_success",
        "members"."state",
        "members"."member_namespace_id",
        "members"."member_role_id",
        "members"."expiry_notified_at",
        "members"."request_accepted_at",
        "members"."last_activity_on"
      FROM
        "members"
      WHERE
        "members"."source_type" = 'Namespace'
        AND "members"."source_id" IN (
          SELECT
            "namespaces"."id"
          FROM
            "namespaces"
          WHERE
            "namespaces"."type" = 'Group'
            AND (traversal_ids @> ('{70960922}'))
        )
    )
    UNION
    (
      SELECT
        "members"."id",
        "members"."access_level",
        "members"."source_id",
        "members"."source_type",
        "members"."user_id",
        "members"."notification_level",
        "members"."type",
        "members"."created_at",
        "members"."updated_at",
        "members"."created_by_id",
        "members"."invite_email",
        "members"."invite_token",
        "members"."invite_accepted_at",
        "members"."requested_at",
        "members"."expires_at",
        "members"."ldap",
        "members"."override",
        "members"."invite_email_success",
        "members"."state",
        "members"."member_namespace_id",
        "members"."member_role_id",
        "members"."expiry_notified_at",
        "members"."request_accepted_at",
        "members"."last_activity_on"
      FROM
        "members"
      WHERE
        "members"."source_type" = 'Project'
        AND "members"."source_id" IN (
          SELECT
            "projects"."id"
          FROM
            "projects"
          WHERE
            "projects"."namespace_id" IN (
              SELECT
                namespaces.traversal_ids [array_length(namespaces.traversal_ids, 1)] AS id
              FROM
                "namespaces"
              WHERE
                "namespaces"."type" = 'Group'
                AND (traversal_ids @> ('{70960922}'))
            )
        )
    )
  ) members
WHERE
  "members"."member_role_id" IS NULL
GROUP BY
  "members"."access_level";
Time: 3.023 min
  - planning: 6.193 ms
  - execution: 3.023 min
    - I/O read: 2.560 min
    - I/O write: 1.673 s

Shared buffers:
  - hits: 986002 (~7.50 GiB) from the buffer pool
  - reads: 1298012 (~9.90 GiB) from the OS file cache, including disk I/O
  - dirtied: 136392 (~1.00 GiB)
  - writes: 65579 (~512.30 MiB)

Temp buffers:
  - reads: 128412 (~1003.20 MiB)
  - writes: 187185 (~1.40 GiB)

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32671/commands/100775

After: (using two queries with results merged in memory)

WITH "namespace_ids" AS (
  SELECT traversal_ids [array_length(traversal_ids, 1)] AS id
  FROM "namespaces"
  WHERE (traversal_ids @> ('{70960922}'))
)
SELECT
  COUNT(*) AS "count_all",
  "members"."access_level" AS "members_access_level"
FROM "members"
INNER JOIN "namespace_ids" ON "members"."source_id" = "namespace_ids"."id" AND "members"."source_type" = 'Namespace'
WHERE "members"."member_role_id" IS NULL
GROUP BY "members"."access_level"
Time: 340.397 ms
  - planning: 45.190 ms
  - execution: 295.207 ms
    - I/O read: 703.703 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 52503 (~410.20 MiB) from the buffer pool
  - reads: 19104 (~149.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32916/commands/101482

WITH "namespace_ids" AS (
  SELECT traversal_ids [array_length(traversal_ids, 1)] AS id
  FROM "namespaces"
  WHERE (traversal_ids @> ('{70960922}'))
),
"project_ids" AS (
  SELECT "projects"."id"
  FROM "projects"
  INNER JOIN "namespace_ids" ON "projects"."namespace_id" = "namespace_ids"."id"
  WHERE "projects"."marked_for_deletion_at" IS NULL
    AND "projects"."pending_delete" = FALSE
)
SELECT COUNT(*) AS "count_all", "members"."access_level" AS "members_access_level"
FROM "members"
INNER JOIN "project_ids" ON "members"."source_id" = "project_ids"."id" AND "members"."source_type" = 'Project'
WHERE "members"."member_role_id" IS NULL
GROUP BY "members"."access_level"
Time: 12.586 s
  - planning: 106.385 ms
  - execution: 12.480 s
    - I/O read: 31.014 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1296360 (~9.90 GiB) from the buffer pool
  - reads: 1437312 (~11.00 GiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32916/commands/101483

#498688

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Before After

How to set up and validate locally

Edited by mo khan

Merge request reports

Loading