Move traversal_ids update to the end of transaction
What does this MR do and why?
This changes moves the synching of traversal_ids from after_create
callback into before_commit
callback.
This change is related to a namespaces locking issue that was very prominent when creating project namespaces during project creation.
Context
When a namespaces is created we also need to build its traversal_ids
collection. This is an optimisation used for hierarchy related queries. In order to build a valid traversal_ids
collection(list of ancestor ids for a given namespace node in the namespaces hierarchy) we need to make sure the ancestors for newly created namespace do not change in the meantime(while we build the traversal_ids collection). To achieve that we lock the root node in the hierarchy, thus locking the hierarchy for the duration of the traversal_ids
update, see https://gitlab.com/gitlab-org/gitlab/blob/efa468aea9c8ef9a3bb32f76eb5e7f7ab858e9fd/app/models/namespace/traversal_hierarchy.rb#L29-L29
To obtain the lock on the root namespace we run a select on the given record with FOR UPDATE
lock and then run the update statement itself, e.g.
BEGIN
-- select the root node and lock the record
SELECT * FROM namespaces WHERE id = 1 FOR UPDATE;
-- update namespace within the hierarchy.
update namespaces set traversal_ids = '{1,2}' where id = 2;
COMMIT
-- Lock is released
Normally this is a fairly fast, however in the context of creating project namespace as part of the project creation transaction the locking and update queries get quite far apart, making the lock stay for a longer time, perhaps few seconds vs few ms. So in context of concurrent project creation within same hierarchy we can hit race conditions where the hierarchy is locked while a project is attempted to be created.
see logs for details
Solution
There is not 100% solution, but we can minimise the duration of the lock, thus allow for a bigger concurrent projects creation throughput. The idea would be to keep the locking and update statement as close to each other and as close to the end of the transaction as possible, so that we minimise the duration of the lock.
That is where before_commit
callback is very helpful as that allows having the locking and update of traversal_ids close to the end of the transaction.
How to set up and validate locally
To test locally
- Download ruby script mention here !79964 (comment 836139089). It runs 10 threads that create 200 projects.
- Make sure you have
statement_timeout
andlock_timeout
set in postgresql- in gdk you can set it in
postgresql.conf
that can be found in<your-gdk-dirdctory>/postgresql/data/postgresql.conf
. I've usedstatement_timeout 15000
andlock_timeout 5000
numbers are in milliseconds.
- in gdk you can set it in
- run the script on master code and on this MR code, you should see locks appear on master code and no locks of way fewer locsk for this MR see !79964 (comment 836139089)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.