upgrades in row-level locks can deadlock - Mailing list pgsql-hackers
From | Oleksii Kliukin |
---|---|
Subject | upgrades in row-level locks can deadlock |
Date | |
Msg-id | B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com Whole thread Raw |
Responses |
Re: upgrades in row-level locks can deadlock
|
List | pgsql-hackers |
Hello, I have recently observed a deadlock on one of our production servers related to locking only a single row in a job table. There were two functions involved in the deadlock, the first one acquires a “for key share” lock on the row that represents the job it works on and subsequently updates it with the job’s end time (we need multiple jobs to be operating on a single row concurrently, that’s why there is a "for key share" lock). The other function starts by acquiring the “for update” lock on the job row and then performs actions that should not be run in parallel with other jobs. The deadlock can be easily reproduced with the following statements. The queries run against a table job (id integer primary key, name text) with a single row of (1,'a')) X1: select id from job where name = 'a' for key share; Y: select id from job where name = 'a' for update; -- starts waiting for X1 X2: select id from job where name = 'a' for key share; X1: update job set name = 'b' where id = 1; X2: update job set name = 'c' where id = 1; -- starts waiting for X1 X1: rollback; At this point, Y is terminated by the deadlock detector: "deadlock detected", Process 53937 waits for ShareLock on transaction 488; blocked by process 53953. Process 53953 waits for ExclusiveLock on tuple (0,1) of relation 16386 of database 12931; blocked by process 53937. Process 53937: select id from job where name = 'a' for update; Process 53953: update job set name = 'c' where id = 1;", The deadlock is between X2 and Y. Y waits for X2 to finish, as X2 holds a "key share" lock, incompatible with "for update" that Y attempts to acquire. On the other hand, X2 needs to acquire the row lock to perform the update, and that is a two-phase process: first, get the tuple lock and then wait for conflicting transactions to finish, releasing the tuple lock afterward. X2 tries to acquire the tuple lock, but it is owned by Y. PostgreSQL detects the deadlock and terminates Y. Such a deadlock only occurs when three or more sessions locking the same row are present and the lock is upgraded in at least one session. With only two sessions the upgrade does not go through the lock manager, as there are no conflicts with locks stored in the tuple. That gave me an idea on how to change PostgreSQL to avoid deadlocking under the condition above. When detecting the lock upgrade from the multixact, we can avoid acquiring the tuple lock; however, we should still wait for the mutlixact members that hold conflicting locks, to avoid acquiring incompatible ones. The patch is attached. I had to tweak heap_update and heap_delete alongside the heap_lock_tuple, as they acquire row locks as well. I am not very happy with overloading DoesMultiXactIdConflict with another function to check if current transaction id is among the multixact members, perhaps it is worth to have a separate function for this. We can figure this out if we agree this is the problem that needs to be solved and on the solution. The other possible objection is related to the statement from README.tuplock that we need to go through the lock manager to avoid starving waiting exclusive-lockers. Since this patch omits the tuple lock only when the lock upgrade happens, it does limit the starvation condition to the cases when the lock compatible with the one the waiting process asks for is acquired first and then gets upgraded to the incompatible one. Since under present conditions the same operation will likely deadlock and cancel the exclusive waiter altogether, I don't see this as a strong objection. Cheers, Oleksii
Attachment
pgsql-hackers by date: