Add option to order locks on add_concurrent_foreign_key
What does this MR do?
During deployment of !65795 (merged), the migration failed due to deadlocks with another process. Upon investigation, the conflicting process creates an event
, and then creates push_event_payloads
for that event. Unfortunately, the migration first failed when adding the foreign key, and then again when trying to obtain the explicit locks on the two tables.
A possible solution to this problem is to reverse the order in which the locks are required, to better match application locking order. For the explicit lock statement, this can be easily done. This MR handles the other piece of that, by modifying the add_concurrent_foreign_key
helper to first explicit lock the tables in a certain order before creating the foreign key.
How to setup and validate locally
Example of how this works:
PG session 1
BEGIN;
LOCK TABLE events in SHARE ROW EXCLUSIVE MODE;
PG session 2
BEGIN;
ALTER TABLE push_event_payloads
ADD CONSTRAINT testfk
FOREIGN (event_id_convert_to_bigint) REFERENCES events (id) NOT VALID;
At this point the ALTER TABLE
will hang, waiting for the events
lock. In a third session you can verify the locks:
SELECT pg_class.relname, pg_locks.pid, pg_locks.mode
FROM pg_locks
LEFT JOIN pg_class
ON pg_class.oid = pg_locks.relation
WHERE locktype = 'relation'
AND relname IN ('events', 'push_event_payloads');
relname | pid | mode
---------------------+-------+-----------------------
events | 94100 | ShareRowExclusiveLock
push_event_payloads | 13124 | ShareRowExclusiveLock
events | 13124 | ShareRowExclusiveLock
Now, if we go back to session 1:
PG session 1
LOCK TABLE push_event_payloads in SHARE ROW EXCLUSIVE MODE;
ERROR: deadlock detected
DETAIL: Process 94100 waits for ShareRowExclusiveLock on relation 12549384 of database 12544519; blocked by process 13124.
Process 13124 waits for ShareRowExclusiveLock on relation 12547458 of database 12544519; blocked by process 94100.
HINT: See server log for query details.
Time: 1001.650 ms (00:01.002)
We get a deadlock. But if we repeat the same experiment, but in session 2 before we create the foreign key, we:
LOCK TABLE events, push_event_payloads IN SHARE ROW EXCLUSIVE MODE;
That transaction will hang until the transaction session completes. We can still get a lock timeout, but we use with_lock_retries
which will retry that operation again.
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.