Re: How do concurrent inserts work? - Mailing list pgsql-novice
From | Kevin Grittner |
---|---|
Subject | Re: How do concurrent inserts work? |
Date | |
Msg-id | 208824206.1367943.1419864105933.JavaMail.yahoo@jws100191.mail.ne1.yahoo.com Whole thread Raw |
In response to | How do concurrent inserts work? (Yaroslav <ladayaroslav@yandex.ru>) |
List | pgsql-novice |
Yaroslav <ladayaroslav@yandex.ru> wrote: > Error told me that there IS such row, but now I see there ISN'T?! > > Can you enlighten me? The most important issue here is that if there are concurrent serializable transactions the effects of those which successfully commit must be consistent with some serial (one-at-a-time) order of execution. Within that transaction you see the concurrent insert (in the form of the duplicate key error), which means your transaction must have logically run *after* the other transaction; yet when you try to select the row you don't see it, which means your transaction must have logically run *before* the other transaction. If your top-level transaction is allowed to commit, that is indeed a failure to conform to the SQL standard and to what (in my experience) most people who rely on serializable transactions expect. If it rolls back, then the invariant that the effects on the database are consistent with some one-at-a-time execution of the successful transactions holds, and ACID properties (and conformance to the standard) are maintained. As Tom mentioned, the argument on the bug report you cited is about which error it is more useful to generate -- the one indicating that the transaction failed due to the actions of one or more concurrent transactions or the one indicating that there was a duplicate key. I'll try not to misrepresent the other position (although I admit to having a strong opinion). The argument in favor of the serialization failure is that most software using that isolation level traps serialization failures at a low level and is able to restart the transaction from the beginning. In your example, the new run of the transaction will see the duplicate row, so it will be indistinguishable from your first case (where the row exists before your transaction starts). This retry-and-see-a-consistent-state approach is preferred in some shops because it avoids the need to write any special code to handle race conditions for concurrent transactions. The argument in favor of the duplicate key error is that the detail shows you what the duplicate key values are. A secondary argument is that any software framework which handles serialization failures by retrying such transactions from the beginning should probably also do one or more (but a finite number of) retries for a duplicate key error, since it might indicate a problem with a concurrent transaction. In your case there could be special coding to handle the duplicate key, and since it would be clear from the contradictory visibility indications that it is the result of a race condition, you might be able to write special edge-condition code to handle it in a special way that might be faster than restarting the transaction. In almost all software I've worked with in the past few decades, a serialization failure (whether in the form of a deadlock, an MVCC write conflict, or other) is not reported back to the application code. The transaction retry is automatic, so the effect from the user PoV and the application software PoV is identical to the transaction having been temporarily blocked. No special coding to handle race conditions is needed, desirable, or (by management fiat) allowed. I have sometimes seen special handling of duplicate keys, but since the error doesn't tell you whether the duplicate was from a race condition these have most often just been left to be user-visible errors. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-novice by date: