Move traversal_ids update to the end of transaction (!79964) · Merge requests · GitLab.org / GitLab

Alexandru Croitor requested to merge move-traversal-id-update-to-before-commit into master Feb 04, 2022

What does this MR do and why?

This changes moves the synching of traversal_ids from after_create callback into before_commit callback.

This change is related to a namespaces locking issue that was very prominent when creating project namespaces during project creation.

Context

When a namespaces is created we also need to build its traversal_ids collection. This is an optimisation used for hierarchy related queries. In order to build a valid traversal_ids collection(list of ancestor ids for a given namespace node in the namespaces hierarchy) we need to make sure the ancestors for newly created namespace do not change in the meantime(while we build the traversal_ids collection). To achieve that we lock the root node in the hierarchy, thus locking the hierarchy for the duration of the traversal_ids update, see https://gitlab.com/gitlab-org/gitlab/blob/efa468aea9c8ef9a3bb32f76eb5e7f7ab858e9fd/app/models/namespace/traversal_hierarchy.rb#L29-L29

To obtain the lock on the root namespace we run a select on the given record with FOR UPDATE lock and then run the update statement itself, e.g.

BEGIN
-- select the root node and lock the record
SELECT * FROM namespaces WHERE id = 1 FOR UPDATE;
-- update namespace within the hierarchy.
update namespaces set traversal_ids = '{1,2}' where id = 2;
COMMIT
-- Lock is released

Normally this is a fairly fast, however in the context of creating project namespace as part of the project creation transaction the locking and update queries get quite far apart, making the lock stay for a longer time, perhaps few seconds vs few ms. So in context of concurrent project creation within same hierarchy we can hit race conditions where the hierarchy is locked while a project is attempted to be created.

see logs for details

Solution

There is not 100% solution, but we can minimise the duration of the lock, thus allow for a bigger concurrent projects creation throughput. The idea would be to keep the locking and update statement as close to each other and as close to the end of the transaction as possible, so that we minimise the duration of the lock.

That is where before_commit callback is very helpful as that allows having the locking and update of traversal_ids close to the end of the transaction.

How to set up and validate locally

To test locally

Download ruby script mention here !79964 (comment 836139089). It runs 10 threads that create 200 projects.
Make sure you have statement_timeout and lock_timeout set in postgresql
- in gdk you can set it in postgresql.conf that can be found in <your-gdk-dirdctory>/postgresql/data/postgresql.conf. I've used statement_timeout 15000 and lock_timeout 5000 numbers are in milliseconds.
run the script on master code and on this MR code, you should see locks appear on master code and no locks of way fewer locsk for this MR see !79964 (comment 836139089)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

re #351323 (closed)

Edited Feb 09, 2022 by Alexandru Croitor

Move traversal_ids update to the end of transaction

What does this MR do and why?

Context

Solution

How to set up and validate locally

MR acceptance checklist

Merge request reports