Re: BUG #12330: ACID is broken for unique constraints - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: BUG #12330: ACID is broken for unique constraints |
Date | |
Msg-id | CAHyXU0zgC30NYYS-X_=pxpvkwct81JWd98_cxJV_dR9UQem92w@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #12330: ACID is broken for unique constraints (Nikita Volkov <nikita.y.volkov@mail.ru>) |
Responses |
Re: BUG #12330: ACID is broken for unique constraints
|
List | pgsql-hackers |
On Mon, Dec 29, 2014 at 10:47 AM, Nikita Volkov <nikita.y.volkov@mail.ru> wrote: >> [combining replies -- nikita, better not to top-post (FYI)] [combining replied again] > I'm sorry. I don't know what you mean. I just replied to an email. http://www.idallen.com/topposting.html >> To prove your statement, you need to demonstrate how a transaction left >> the database in a bad state given concurrent activity without counting >> failures. > > 1. Transaction A looks up a row by ID 1 and gets an empty result. > 2. Concurrent transaction B inserts a row with ID 1. > 3. Transaction A goes on with the presumption that a row with ID 1 does not > exist, because a transaction is supposed to be isolated and because it has > made sure that the row does not exist. With this presumption it confidently > inserts a row with ID 1 only to get Postgres report a duplicate key. Wat? Your understanding of isolation is incorrect. Transaction A does not go on with anything -- it's guaranteed to fail in this case. The only debatable point here is how exactly it fails. Again, isolation's job is to protect the data. On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > The semantics are so imprecise that Tom argued that we should > document that transactions should be retried from the start when > you get the duplicate key error, since it *might* have been caused > by a race condition. That sounds off to me also. In terms of a classic uniqueness constraint (say, a identifying user name), every violation is technically a race condition -- whether or not the transactions overlap on time is completely irrelevant. If the transactions touching off the error happen to overlap or not is an accident of timing and irrelevant; a serialization error suggests that the transaction should be retried when in fact it shouldn't be, particularly just to get the *actual* error. What if the transaction is non-trivial? Why do we want to bother our users about those details at all? Consider the 'idiomatic upsert' as it exists in the documentation (!): LOOP -- first try to update the key UPDATE db SET b = data WHERE a = key; IF found THEN RETURN; END IF; -- XXX merlin's note: if any dependent table throws a UV, -- say, via a trigger, thiscode will loop endlessly -- not there, so try to insert the key -- if someone else inserts the same key concurrently, -- we could get a unique-key failure BEGIN INSERT INTO db(a,b) VALUES (key, data); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing, and loop to try the UPDATE again END; END LOOP; By changing the error code, for decades worth of dealing with this problem, you've just converted a server side loop to a full round trip, and, if the user does not automatically retry serialization failures, broken his/her code. It's impossible to fix the round trip issue, at least provably, because there is no way to know for sure that the serialization failure is coming from this exact insertion, or say, a dependent trigger (aside: the idiomatic example aught to be checking the table name!) such that your loop (either here or from application) would execute a bazillion times until some other transaction clears. OK, this is a mostly academic detail, but the picture is not so clear as you're saying, I think; you're travelling at high speed in uncertain waters. The key point here is that OP issued a SELECT first, and he's chaining DML decisions to the output of that select. He's expecting that SELECT to be protected via ACID, but it isn't and can't be unless you're prepared to predicate lock every row selected. What he wants is for the database to bounce his transaction because the select lied to him, but that can't be done obviously. > I'm curious how heavily you use serializable transactions, because > I have trouble believing that those who rely on them as their > primary (or only) strategy for dealing with race conditions under > high concurrency would take that position. I don't use them much, admittedly. That said, I don't use them as race condition guards. I use locks or other techniques to manage the problem. I tend to build out applications on top of functions and the inability to set isolation mode inside a function confounds me from using anything but 'read committed'. merlin
pgsql-hackers by date: