Resolve "Improve linear descendant scopes" (!69535) · Merge requests · GitLab.org / GitLab

Alex Pooley requested to merge 340019-improve-linear-descendant-scopes into master Sep 03, 2021

What does this MR do?

This MR is a solution to the problem that certain types of linear queries that search for descendants of multiple groups can get slow:

SELECT namespaces.*
FROM
  (SELECT DISTINCT on(namespaces.id) namespaces.*
   FROM namespaces,
     (SELECT namespaces.id
      FROM namespaces
      INNER JOIN members ON namespaces.id = members.source_id
      WHERE members.type = 'GroupMember'
        AND members.source_type = 'Namespace'
        AND namespaces.type = 'Group'
        AND members.user_id = 1614863
        AND members.requested_at IS NULL
        AND (access_level >= 10)
        AND members.access_level IN (40, 50)) base
   WHERE (namespaces.traversal_ids @> ARRAY[base.id])) namespaces

Time: 32.523 s  
  - planning: 1.212 ms  
  - execution: 32.522 s  
    - I/O read: 9.312 s  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 882179 (~6.70 GiB) from the buffer pool  
  - reads: 11958 (~93.40 MiB) from the OS file cache, including disk I/O  
  - dirtied: 1049 (~8.20 MiB)  
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6330/commands/21637

We can vastly improve performance by moving to a BTree index over the GIN index.

CREATE INDEX index_btree_namespaces_traversal_ids ON namespaces using btree (traversal_ids);

-- Given array [1,2,3,4,5], concatenate the first part of the array [1,2,3,4]
-- with the last element in the array (5) after being incremented ([6]).
--
-- [1,2,3,4,5] => [1,2,3,4,6]
CREATE OR REPLACE FUNCTION next_traversal_ids_sibling(traversal_ids INT[]) RETURNS INT[]
AS $$
BEGIN
  return traversal_ids[1:array_length(traversal_ids, 1)-1] ||
  ARRAY[traversal_ids[array_length(traversal_ids, 1)]+1];
END;
$$
LANGUAGE plpgsql
IMMUTABLE
RETURNS NULL ON NULL INPUT;

WITH cte AS (
    SELECT namespaces.traversal_ids,
            LEAD (traversal_ids, 1) OVER (ORDER BY traversal_ids ASC) next_traversal_ids
    FROM namespaces
    INNER JOIN members ON namespaces.id = members.source_id
    WHERE members.type = 'GroupMember'
    AND members.source_type = 'Namespace'
    AND namespaces.type = 'Group'
    AND members.user_id = 1614863
    AND members.requested_at IS NULL
    AND (access_level >= 10)
    AND members.access_level IN (40, 50)
)
SELECT n.*
FROM namespaces n, cte
WHERE cte.traversal_ids <= n.traversal_ids
  AND (cte.next_traversal_ids IS NULL OR cte.next_traversal_ids > n.traversal_ids)
  AND next_traversal_ids_sibling(cte.traversal_ids) > n.traversal_ids

Warm cache:

Time: 128.863 ms  
  - planning: 1.247 ms  
  - execution: 127.616 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 65810 (~514.10 MiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6911/commands/24457.

Cold cache:

Time: 4.938 s  
  - planning: 4.837 ms  
  - execution: 4.933 s  
    - I/O read: 4.728 s  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 54413 (~425.10 MiB) from the buffer pool  
  - reads: 11453 (~89.50 MiB) from the OS file cache, including disk I/O  
  - dirtied: 781 (~6.10 MiB)  
  - writes: 0

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6911/commands/24456.

NOTE: The majority of the cold query plan is taken by the base CTE portion of the query.

The idea behind this fix is easiest to understand through an example. Given this hierarchy:

graph TD;
    gitlab-->backend;
    backend-->create;
    backend-->manage;
    create-->source;
    manage-->access;
    gitlab-->frontend;

The value of the access namespaces.traversal_ids column is [gitlab, backend, manage, access]. This is an ordered set of ancestor IDs.

Using the GIN index all descendants of backend are all namespaces that contain [gitlab, backend] in the namespaces.traversal_ids array column.

Using the BTree index the namespaces.traversal_ids column is ordered. Here is an example set of arrays to understand the sort order:

[1]
[1,2,3]
[1,2,3,4]
[10]
[10,5]
[123]

All descendants of backend are all namespaces that are greater than [gitlab, backend] and also less than [gitlab, frontend].

There is a further quirk to the query we need to make where we might need to find descendants for multiple groups. This set of groups to search can be redundant which can dramatically increase query time. For instance, in the above example, you may receive a query to find all descendants for the namespaces gitlab, backend, manage. The gitlab hierarchy is a superset of the backend and manage hierarchy, but the database will retrieve all hierarchies anyway. To reduce the redundant fetch we order the source groups to search by their traversal_ids and only read up to that traversal_ids value. This ensures we don't read more than we should.

Migrations

> rails db:migrate:down VERSION=20211012015903
== 20211012015903 NextTraversalIdsSiblingFunction: reverting ==================
-- execute("DROP FUNCTION next_traversal_ids_sibling(traversal_ids INT[])")
   -> 0.0109s
== 20211012015903 NextTraversalIdsSiblingFunction: reverted (0.0110s) =========


> rails db:migrate:up VERSION=20211012015903
== 20211012015903 NextTraversalIdsSiblingFunction: migrating ==================
-- execute("CREATE OR REPLACE FUNCTION next_traversal_ids_sibling(traversal_ids INT[]) RETURNS INT[]\nAS $$\nBEGIN\n  return traversal_ids[1:array_length(traversal_ids, 1)-1] ||\n  ARRAY[traversal_ids[array_length(traversal_ids, 1)]+1];\nEND;\n$$\nLANGUAGE plpgsql\nIMMUTABLE\nRETURNS NULL ON NULL INPUT;\n")
   -> 0.0162s
== 20211012015903 NextTraversalIdsSiblingFunction: migrated (0.0162s) =========

> rails db:migrate:down VERSION=20211012051221
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: reverting ==============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:namespaces, :traversal_ids, {:name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
   -> 0.0092s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:namespaces, {:name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently, :column=>:traversal_ids})
   -> 0.0134s
-- execute("RESET statement_timeout")
   -> 0.0008s
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: reverted (0.0317s) =====

> rails db:migrate:up VERSION=20211012051221
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: migrating ==============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:namespaces, :traversal_ids, {:using=>:btree, :name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
   -> 0.0097s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:namespaces, :traversal_ids, {:using=>:btree, :name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
   -> 0.0127s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: migrated (0.0260s) =====

How to setup and validate locally (strongly suggested)

Feature.enable :use_traversal_ids
Feature.enable :traversal_ids_btree
Execute the following in the Rails console with the proper id depending on the scope testing:

Group.where(id: [1, 2]).self_and_descendants

Does this MR meet the acceptance criteria?

Conformity

I have included changelog trailers, or none are needed. (Does this MR need a changelog?)
[-] I have added/updated documentation, or it's not needed. (Is documentation required?)
[-] I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
I have followed the style guides.
This change is backwards compatible across updates, or this does not apply.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
[-] I have tested this MR in all supported browsers, or it's not needed.
[-] I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Closes #340019 (closed)

Edited Oct 14, 2021 by Alex Pooley

Resolve "Improve linear descendant scopes"