Resolve "Improve linear descendant scopes"
What does this MR do?
This MR is a solution to the problem that certain types of linear queries that search for descendants of multiple groups can get slow:
SELECT namespaces.*
FROM
(SELECT DISTINCT on(namespaces.id) namespaces.*
FROM namespaces,
(SELECT namespaces.id
FROM namespaces
INNER JOIN members ON namespaces.id = members.source_id
WHERE members.type = 'GroupMember'
AND members.source_type = 'Namespace'
AND namespaces.type = 'Group'
AND members.user_id = 1614863
AND members.requested_at IS NULL
AND (access_level >= 10)
AND members.access_level IN (40, 50)) base
WHERE (namespaces.traversal_ids @> ARRAY[base.id])) namespaces
Time: 32.523 s
- planning: 1.212 ms
- execution: 32.522 s
- I/O read: 9.312 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 882179 (~6.70 GiB) from the buffer pool
- reads: 11958 (~93.40 MiB) from the OS file cache, including disk I/O
- dirtied: 1049 (~8.20 MiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6330/commands/21637
We can vastly improve performance by moving to a BTree index over the GIN index.
CREATE INDEX index_btree_namespaces_traversal_ids ON namespaces using btree (traversal_ids);
-- Given array [1,2,3,4,5], concatenate the first part of the array [1,2,3,4]
-- with the last element in the array (5) after being incremented ([6]).
--
-- [1,2,3,4,5] => [1,2,3,4,6]
CREATE OR REPLACE FUNCTION next_traversal_ids_sibling(traversal_ids INT[]) RETURNS INT[]
AS $$
BEGIN
return traversal_ids[1:array_length(traversal_ids, 1)-1] ||
ARRAY[traversal_ids[array_length(traversal_ids, 1)]+1];
END;
$$
LANGUAGE plpgsql
IMMUTABLE
RETURNS NULL ON NULL INPUT;
WITH cte AS (
SELECT namespaces.traversal_ids,
LEAD (traversal_ids, 1) OVER (ORDER BY traversal_ids ASC) next_traversal_ids
FROM namespaces
INNER JOIN members ON namespaces.id = members.source_id
WHERE members.type = 'GroupMember'
AND members.source_type = 'Namespace'
AND namespaces.type = 'Group'
AND members.user_id = 1614863
AND members.requested_at IS NULL
AND (access_level >= 10)
AND members.access_level IN (40, 50)
)
SELECT n.*
FROM namespaces n, cte
WHERE cte.traversal_ids <= n.traversal_ids
AND (cte.next_traversal_ids IS NULL OR cte.next_traversal_ids > n.traversal_ids)
AND next_traversal_ids_sibling(cte.traversal_ids) > n.traversal_ids
Warm cache:
Time: 128.863 ms
- planning: 1.247 ms
- execution: 127.616 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 65810 (~514.10 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6911/commands/24457.
Cold cache:
Time: 4.938 s
- planning: 4.837 ms
- execution: 4.933 s
- I/O read: 4.728 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 54413 (~425.10 MiB) from the buffer pool
- reads: 11453 (~89.50 MiB) from the OS file cache, including disk I/O
- dirtied: 781 (~6.10 MiB)
- writes: 0
Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6911/commands/24456.
NOTE: The majority of the cold query plan is taken by the base CTE portion of the query.
The idea behind this fix is easiest to understand through an example. Given this hierarchy:
graph TD;
gitlab-->backend;
backend-->create;
backend-->manage;
create-->source;
manage-->access;
gitlab-->frontend;
The value of the access
namespaces.traversal_ids
column is [gitlab, backend, manage, access]
. This is an ordered set of ancestor IDs.
Using the GIN index all descendants of backend
are all namespaces that contain [gitlab, backend]
in the namespaces.traversal_ids
array column.
Using the BTree index the namespaces.traversal_ids
column is ordered. Here is an example set of arrays to understand the sort order:
[1]
[1,2,3]
[1,2,3,4]
[10]
[10,5]
[123]
All descendants of backend
are all namespaces that are greater than [gitlab, backend]
and also less than [gitlab, frontend]
.
There is a further quirk to the query we need to make where we might need to find descendants for multiple groups. This set of groups to search can be redundant which can dramatically increase query time. For instance, in the above example, you may receive a query to find all descendants for the namespaces gitlab
, backend
, manage
. The gitlab
hierarchy is a superset of the backend
and manage
hierarchy, but the database will retrieve all hierarchies anyway. To reduce the redundant fetch we order the source groups to search by their traversal_ids
and only read up to that traversal_ids
value. This ensures we don't read more than we should.
Migrations
> rails db:migrate:down VERSION=20211012015903
== 20211012015903 NextTraversalIdsSiblingFunction: reverting ==================
-- execute("DROP FUNCTION next_traversal_ids_sibling(traversal_ids INT[])")
-> 0.0109s
== 20211012015903 NextTraversalIdsSiblingFunction: reverted (0.0110s) =========
> rails db:migrate:up VERSION=20211012015903
== 20211012015903 NextTraversalIdsSiblingFunction: migrating ==================
-- execute("CREATE OR REPLACE FUNCTION next_traversal_ids_sibling(traversal_ids INT[]) RETURNS INT[]\nAS $$\nBEGIN\n return traversal_ids[1:array_length(traversal_ids, 1)-1] ||\n ARRAY[traversal_ids[array_length(traversal_ids, 1)]+1];\nEND;\n$$\nLANGUAGE plpgsql\nIMMUTABLE\nRETURNS NULL ON NULL INPUT;\n")
-> 0.0162s
== 20211012015903 NextTraversalIdsSiblingFunction: migrated (0.0162s) =========
> rails db:migrate:down VERSION=20211012051221
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: reverting ==============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:namespaces, :traversal_ids, {:name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
-> 0.0092s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:namespaces, {:name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently, :column=>:traversal_ids})
-> 0.0134s
-- execute("RESET statement_timeout")
-> 0.0008s
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: reverted (0.0317s) =====
> rails db:migrate:up VERSION=20211012051221
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: migrating ==============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:namespaces, :traversal_ids, {:using=>:btree, :name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
-> 0.0097s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:namespaces, :traversal_ids, {:using=>:btree, :name=>"index_btree_namespaces_traversal_ids", :algorithm=>:concurrently})
-> 0.0127s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20211012051221 AddIndexBtreeNamespacesTraversalIds: migrated (0.0260s) =====
How to setup and validate locally (strongly suggested)
Feature.enable :use_traversal_ids
Feature.enable :traversal_ids_btree
- Execute the following in the Rails console with the proper id depending on the scope testing:
Group.where(id: [1, 2]).self_and_descendants
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) - [-] I have added/updated documentation, or it's not needed. (Is documentation required?)
- [-] I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
-
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) - [-] I have tested this MR in all supported browsers, or it's not needed.
- [-] I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Closes #340019 (closed)