Re: BUG #12330: ACID is broken for unique constraints - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: BUG #12330: ACID is broken for unique constraints |
Date | |
Msg-id | 2012416048.1469125.1419890005044.JavaMail.yahoo@jws100170.mail.ne1.yahoo.com Whole thread Raw |
In response to | Re: BUG #12330: ACID is broken for unique constraints (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: BUG #12330: ACID is broken for unique constraints
Re: BUG #12330: ACID is broken for unique constraints |
List | pgsql-hackers |
Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >> The semantics are so imprecise that Tom argued that we should >> document that transactions should be retried from the start when >> you get the duplicate key error, since it *might* have been caused >> by a race condition. > That sounds off to me also. In terms of a classic uniqueness > constraint (say, a identifying user name), every violation is > technically a race condition -- whether or not the transactions > overlap on time is completely irrelevant. That is completely bogus. The point is that if you return a serialization failure the transaction should be immediately retried from the beginning (including, in many cases, acquiring key values). If the error was caused by concurrent insertion of a duplicate key where the transaction does *not* acquire the value within the transaction, *then* you get the duplicate key error on the retry. > If the transactions > touching off the error happen to overlap or not is an accident of > timing and irrelevant; a serialization error suggests that the > transaction should be retried when in fact it shouldn't be, > particularly just to get the *actual* error. What if the transaction > is non-trivial? Why do we want to bother our users about those > details at all? Where serializable transactions are used to manage race conditions the users typically do not see them. The application code does not see them. There is normally some sort of framework (possibly using dependency injection, although not necessarily) which catches these and retries the transaction from the start without notifying the user or letting the application software know that it happened. There is normally some server-side logging so that high volumes of rollbacks can be identified and fixed. In a real-world situation where this was used for 100 production databases running millions of transactions per day, I saw about 10 or 20 serialization failures per day across the whole set of database servers. While I have certainly heard of workloads where it didn't work out that well, Dan Ports found that many common benchmarks perform about that well. Quoting from the peer-reviewed paper presented in Istanbul[1]: | SSI outperforms S2PL for all transaction mixes, and does so by a | significant margin when the fraction of read-only transactions is | high. On these workloads, there are more rw-conflicts between | concurrent transactions, so locking imposes a larger performance | penalty. (The 100%-read-only workload is a special case; there are | no lock conflicts under S2PL, and SSI has no overhead because all | snapshots are safe.) The 150-warehouse configuration (Figure 5b ) | behaves similarly, but the differences are less pronounced: on this | disk-bound benchmark, CPU overhead is not a factor, and improved | concurrency has a limited benefit. Here, the performance of SSI | is indistinguishable from that of SI. Transactions rarely need to | be retried; in all cases, the serialization failure rate was under | 0.25%. > Consider the 'idiomatic upsert' as it exists in the documentation (!): That documentation should probably be updated to indicate which isolation levels need that code. If you are relying on serializable transactions that dance is unnecessary and pointless. The rule when relying on serializable transactions is that you write the code to behave correctly if the transaction executing it is running by itself. Period. No special handling for race conditions. Detecting race conditions is the job of the database engine and retrying affected transactions is the job of the framework. Absolutely nobody who understands serializable transactions would use that idiom inside of serializable transactions. > By changing the error code, for decades worth of dealing with this > problem, you've just converted a server side loop to a full round > trip, and, if the user does not automatically retry serialization > failures, broken his/her code. If they are not automatically retrying on serialization failures, they should probably not be using serializable transactions. That is rather the point. No need for table locks. No need for SELECT FOR UPDATE. No need to special-case for concurrent transactions. "Just do it." Let the framework retry as needed. I have no problem with there being people who choose not to use this approach. What I'm asking is that they not insist on PostgreSQL being needlessly crippled for those who *do* use it this way. > It's impossible to fix the round trip > issue, at least provably, because there is no way to know for sure > that the serialization failure is coming from this exact insertion, or > say, a dependent trigger (aside: the idiomatic example aught to be > checking the table name!) such that your loop (either here or from > application) would execute a bazillion times until some other > transaction clears. Until the inserting transaction commits, it would block the other insertion attempt, so the above is simply wrong. We do our best in our serializable implementation to avoid a situation where a transaction retry can hit the same set of conflicting transactions. Doing so is *extremely* rare except in the case of a prepared transaction which is left hanging for a long time. There's no getting around the possibility of some serializable transaction head-banging in that case, I'm afraid; but otherwise, the above is not generally an issue. Now, Andres showed that using serializable transactions to manage access to a table that is being used as a queue is a pessimal case; I can see using some form of explicit locking and a lower isolation level for accessing a queue. > OK, this is a mostly academic detail, but the > picture is not so clear as you're saying, I think; you're travelling > at high speed in uncertain waters. Hardly. Alan Fekete has published a series of papers on the subject, over more than a decade (starting, as far as I'm aware, with one funded by the National Science Foundation). This work was building on work of Atul Adya of MIT. One of the findings I remember from a paper where Alan Fekete was an author showed that adding sufficient explicit locks to software using snapshot isolation to make is safe *in the general case* dragged performance and concurrency down to the levels of S2PL. (I'm sorry I don't have the citation for that -- there were so many papers he authored or co-authored on this topic it would probably take hours to find that one.) Alan Fekete was one of the authors on the paper which got me started looking at this technique[1]. (That paper won "best paper" at ACM SIGMOD 2008, BTW.) We incorporated enhancements to the paper which were in Michael Cahill's doctoral work[2], as well as further enhancements that Dan Ports of MIT and I came up with during development (which were subsequently reviewed by Michael Cahill and Alan Fekete). Practical aspects of the implementation incorporate suggestions (and sometimes code) from others in the PostgreSQL community -- see Section 10 of [3]. > The key point here is that OP issued a SELECT first, and he's chaining > DML decisions to the output of that select. He's expecting that SELECT > to be protected via ACID, but it isn't and can't be unless you're > prepared to predicate lock every row selected. Yes, that is what happens when you run at the serializable transaction isolation level. > What he wants is for the database to bounce his transaction > because the select lied to him, but that can't be done obviously. It can and is done. Please review the cited papers and/or look at the Wiki page of practical examples I have created: https://wiki.postgresql.org/wiki/SSI >> I'm curious how heavily you use serializable transactions, because >> I have trouble believing that those who rely on them as their >> primary (or only) strategy for dealing with race conditions under >> high concurrency would take that position. > > I don't use them much, admittedly. That said, I don't use them as > race condition guards. Well, that is their entire and only purpose, and they are of no use at all unless used consistently for all transactions (or all but carefully considered exceptional circumstances, like queues). > I use locks or other techniques to manage the problem. I think most people on this list do. In my experience, it is only the large shops with tens of thousands of transaction types and dozens of programmers modifying and adding new transaction types while other run ad hoc queries that the serializable transactions become important. My experiences on this list have show me that unless you have worked in a shop where the above are true and they have therefore moved to the "write each transaction so that it does the right thing when run by itself and our framework will cover all concurrency issues with serializable transactions and retries on serialization failures" you will not really understand why consistent return of a serialization failure indication for problems caused by concurrent transactions is so important. > I tend to build out applications on top of functions and the > inability to set isolation mode inside a function confounds me > from using anything but 'read committed'. Hey, no problem -- just set default_transaction_isolation = 'serializable' in your postgresql.conf file and never override isolation level and you'll never have to use an explicit table lock or SELECT FOR UPDATE again. While you're at it, set default_transaction_read_only = on and only override it for transactions that (might) need to update and you'll have dodged the worst of the performance problems from serializable transactions. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. * Serializable isolation for snapshot databases. * In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD * international conference on Management of data, * pages 729-738, New York, NY, USA. ACM. * http://doi.acm.org/10.1145/1376616.1376690 [2] Michael James Cahill. 2009. * Serializable Isolation for Snapshot Databases. * Sydney Digital Theses. * University of Sydney, School of Information Technologies. * http://hdl.handle.net/2123/5353 [3] Dan R. K. Ports and Kevin Grittner. 2012. * Serializable Snapshot Isolation in PostgreSQL. * Proceedings of the VLDB Endowment, Vol. 5, No. 12. * The 38th International Conference on Very Large Data Bases, * August 27th - 31st 2012, Istanbul, Turkey. * http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
pgsql-hackers by date: