Thread: Re: BUG #12330: ACID is broken for unique constraints

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

26 December 2014, 15:24:11

"nikita.y.volkov@mail.ru" <nikita.y.volkov@mail.ru> wrote:

> Executing concurrent transactions inserting the same value of a
> unique key fails with the "duplicate key" error under code
> "23505" instead of any of transaction conflict errors with a
> "40***" code.

This is true, and can certainly be inconvenient when using
serializable transactions to simplify handling of race conditions,
because you can't just test for a SQLSTATE of '40001' or '40P01' to
indicate the need to retry the transaction.  You have two
reasonable ways to avoid duplicate keys if the values are synthetic
and automatically generated.  One is to use a SEQUENCE object to
generate the values.  The other (really only recommended if gaps in
the sequence are a problem) is to have the serializable transaction
update a row to "claim" the number.

Otherwise you need to consider errors related to duplicates as
possibly being caused by a concurrent transaction.  You may want to
do one transaction retry in such cases, and fail if an identical
error is seen.  Keep in mind that these errors do not allow
serialization anomalies to appear in the committed data, so are
arguably not violations of ACID principles -- more of a wart on the
otherwise clean technique of using serializable transactions to
simplify application programming under concurrency.

Thinking about it just now I think we might be able to generate a
write conflict instead of a duplicate key error for this case by
checking the visibility information for the duplicate row.  It
might not even have a significant performance impact, since we need
to check visibility information to generate the duplicate key
error.  That would still leave similar issues (where similar
arguments can be made) relating to foreign keys; but those can
largely be addressed already by declaring the constraints to be
DEFERRED -- and anyway, that would be a separate fix.

I'm moving this discussion to the -hackers list so that I can ask
other developers:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction?  Only for transactions at isolation level
REPEATABLE READ or higher?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Marko Tiikkaja

Date:

26 December 2014, 15:44:34

On 2014-12-26 17:23, Kevin Grittner wrote:
> Are there any objections to generating a write conflict instead of
> a duplicate key error if the duplicate key was added by a
> concurrent transaction?  Only for transactions at isolation level
> REPEATABLE READ or higher?

Is it possible to distinguish between an actual write conflict and a 
completely unrelated UNIQUE constraint ending up violated for some reason?

.marko

Re: BUG #12330: ACID is broken for unique constraints

From

Tom Lane

Date:

26 December 2014, 15:51:24

Kevin Grittner <kgrittn@ymail.com> writes:
> Are there any objections to generating a write conflict instead of
> a duplicate key error if the duplicate key was added by a
> concurrent transaction?

Yes.  This will deliver a less meaningful error code, *and* break
existing code that is expecting the current behavior.
        regards, tom lane

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

26 December 2014, 17:16:22

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:

>> Are there any objections to generating a write conflict instead
>> of a duplicate key error if the duplicate key was added by a
>> concurrent transaction?
>
> Yes.  This will deliver a less meaningful error code,

That depends entirely on whether you care more about whether the
problem was created by a concurrent transaction or exactly how that
concurrent transaction created the problem.  For those using
serializable transactions to manage concurrency the former is at
least an order of magnitude more important.  This is not the first
time getting a constraint violation SQLSTATE for the actions of a
concurrent serializable transaction has been reported as a bug.
Going from memory, I think this might be about the fifth time users
have reported it as a bug or potential bug on these lists.

People using serializable transactions normally run all queries
through common code with will retry the transaction from the start
if there is a SQLSTATE starting with '40' (or perhaps picking out
the specific codes '40001' and '40P01').  Not doing so for *some*
types of errors generated by concurrent transactions reduces the
application's level of user-friendliness, and may introduce
surprising bugs.  In this particular case the OP wants to do one
thing if a row with a paricular value for a unique index exists,
and something different if it doesn't.  If we generate the write
conflict for the case that it is concurrently added, it can retry
the transaction and do one or the other; if we don't pay attention
to that, they need weird heuristics for "the third case".  That
really is not more meaningful or useful.

> *and* break existing code that is expecting the current behavior.

Possibly, but my experience is more that failure to behave the way
I suggest is biting people and causing them a lot of extra work and
pain.  I would be fine with limiting the new behavior to
serializable transactions, since that seems to be where people want
this behavior.  It would bring us closer to "the transaction will
run as though it were the only transaction running or roll back
with a serialization failure" without having to add caveats and
exceptions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Tom Lane

Date:

26 December 2014, 17:31:44

Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yes.  This will deliver a less meaningful error code,

> That depends entirely on whether you care more about whether the
> problem was created by a concurrent transaction or exactly how that
> concurrent transaction created the problem.

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value.  This will be a functional regression,
on top of breaking existing code.

I think an appropriate response to these complaints is to fix the
documentation to point out that duplicate-key violations may also
be worthy of retries.  (I sort of thought it did already, actually,
but I see no mention of the issue in chapter 13.)
        regards, tom lane

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

26 December 2014, 18:38:20

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Just for starters, a 40XXX error report will fail to provide the
> duplicated key's value.  This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure.  Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

> I think an appropriate response to these complaints is to fix the
> documentation to point out that duplicate-key violations may also
> be worthy of retries.

> but I see no mention of the issue in chapter 13.)

I agree that's the best we can do for stable branches, and worth
doing.

It would be interesting to hear from others who have rely on 
serializable transactions in production environments about what 
makes sense to them.  This is probably the wrong list to find such 
people directly; but I seem to recall Josh Berkus has a lot of 
clients who do.  Josh?  Any opinion on this thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Nikita Volkov

Date:

26 December 2014, 19:28:21

I'll repost my (OP) case, for the references to it to make more sense to the others.

Having the following table:

CREATE TABLE "song_artist" (
"song_id" INT8 NOT NULL,
"artist_id" INT8 NOT NULL,
PRIMARY KEY ("song_id", "artist_id")
);

Even trying to protect from this with a select, won't help to get away from
the error, because at the beginning of the transaction the key does not
exist yet.

BEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE;
INSERT INTO song_artist (song_id, artist_id)
SELECT 1, 2
WHERE NOT EXISTS (SELECT * FROM song_artist WHERE song_id=1 AND
artist_id=2);
COMMIT;

2014-12-26 21:38 GMT+03:00 Kevin Grittner <kgrittn@ymail.com>:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Just for starters, a 40XXX error report will fail to provide the
> duplicated key's value. This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure. Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

> I think an appropriate response to these complaints is to fix the
> documentation to point out that duplicate-key violations may also
> be worthy of retries.

> but I see no mention of the issue in chapter 13.)

I agree that's the best we can do for stable branches, and worth
doing.

It would be interesting to hear from others who have rely on
serializable transactions in production environments about what
makes sense to them. This is probably the wrong list to find such
people directly; but I seem to recall Josh Berkus has a lot of
clients who do. Josh? Any opinion on this thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Peter Geoghegan

Date:

26 December 2014, 20:05:30

On Fri, Dec 26, 2014 at 7:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Are there any objections to generating a write conflict instead of
> a duplicate key error if the duplicate key was added by a
> concurrent transaction?  Only for transactions at isolation level
> REPEATABLE READ or higher?

This independently occurred to me as perhaps preferable over a year
ago. The INSERT...ON CONFLICT IGNORE feature that my patch adds will
throw such an error for REPEATABLE READ and higher modes rather than
proceeding on the basis of having ignored something from a concurrent
transaction.

+1 from me for doing this in the master branch, if it isn't too
invasive (I think it might be; where is estate available from to work
with close to duplicate checking for B-Trees?). It only makes sense to
do what you propose if the users expects to check that there is no
duplicate ahead of time, and only ever fail with a serialization
failure (not a duplicate violation) from concurrent conflicts. That is
a bit narrow; the OP can only reasonably expect to not see a duplicate
violation on retry because he happens to be checking...most clients
won't, and will "waste" a retry.

The proposed ON CONFLICT IGNORE feature would do this checking for him
correctly, FWIW, but in typical cases there is no real point in
retrying the xact -- you'll just get another duplicate violation
error.

In any case I think exclusion violations should be covered, too.
-- 
Peter Geoghegan

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

27 December 2014, 05:56:24

On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Just for starters, a 40XXX error report will fail to provide the
>> duplicated key's value.  This will be a functional regression,
>
> Not if, as is normally the case, the transaction is retried from
> the beginning on a serialization failure.  Either the code will
> check for a duplicate (as in the case of the OP on this thread) and
> they won't see the error, *or* the the transaction which created
> the duplicate key will have committed before the start of the retry
> and you will get the duplicate key error.

I'm not buying that; that argument assumes duplicate key errors are
always 'upsert' driven.  Although OP's code may have checked for
duplicates it's perfectly reasonable (and in many cases preferable) to
force the transaction to fail and report the error directly back to
the application.  The application will then switch on the error code
and decide what to do: retry for deadlock/serialization or abort for
data integrity error.  IOW, the error handling semantics are
fundamentally different and should not be mixed.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

29 December 2014, 14:03:40

<div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida
Grande,sans-serif;font-size:16px"><div id="yiv4165093533"><div id="yui_3_16_0_1_1419370979543_2610776"><div
id="yui_3_16_0_1_1419370979543_2610775"style="color:#000;background-color:#fff;font-family:HelveticaNeue, Helvetica
Neue,Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;">Merlin Moncure <mmoncure@gmail.com> wrote:<br
class="yiv4165093533"clear="none" style="" />> On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner
<kgrittn@ymail.com>wrote:<br class="yiv4165093533" clear="none" style="" />>> Tom Lane
<tgl@sss.pgh.pa.us>wrote:<br class="yiv4165093533" clear="none" style="" />>><br class="yiv4165093533"
clear="none"style="" />>>> Just for starters, a 40XXX error report will fail to provide the<br
class="yiv4165093533"clear="none" style="" />>>> duplicated key's value.  This will be a functional
regression,<brclass="yiv4165093533" clear="none" style="" />>><br class="yiv4165093533" clear="none" style=""
/>>>Not if, as is normally the case, the transaction is retried from<br class="yiv4165093533" clear="none"
style=""/>>> the beginning on a serialization failure.  Either the code will<br class="yiv4165093533"
clear="none"style="" />>> check for a duplicate (as in the case of the OP on this thread) and<br
class="yiv4165093533"clear="none" style="" />>> they won't see the error, *or* the the transaction which
created<brclass="yiv4165093533" clear="none" style="" />>> the duplicate key will have committed before the start
ofthe retry<br class="yiv4165093533" clear="none" style="" />>> and you will get the duplicate key error.<br
class="yiv4165093533"clear="none" style="" />><br class="yiv4165093533" clear="none" style="" />> I'm not buying
that;that argument assumes duplicate key errors are<br class="yiv4165093533" clear="none" style="" />> always
'upsert'driven.  Although OP's code may have checked for<br class="yiv4165093533" clear="none" style="" />>
duplicatesit's perfectly reasonable (and in many cases preferable) to<br class="yiv4165093533" clear="none" style=""
/>>force the transaction to fail and report the error directly back to<br class="yiv4165093533" clear="none"
style=""/>> the application.  The application will then switch on the error code<br class="yiv4165093533"
clear="none"style="" />> and decide what to do: retry for deadlock/serialization or abort for<br
class="yiv4165093533"clear="none" style="" />> data integrity error.  IOW, the error handling semantics are<br
class="yiv4165093533"clear="none" style="" />> fundamentally different and should not be mixed.<br
class="yiv4165093533"clear="none" style="" /><br class="yiv4165093533" clear="none" style="" />I think you might be
agreeingwith me without realizing it.  Right <br class="yiv4165093533" clear="none" style="" />now you get "duplicate
keyerror" even if the duplication is caused <br class="yiv4165093533" clear="none" style="" />by a concurrent
transaction-- it is not possible to check the <br class="yiv4165093533" clear="none" style="" />error code (well,
SQLSTATE,technically) to determine whether this <br class="yiv4165093533" clear="none" style="" />is fundamentally a
serializationproblem.  What we're talking about <br class="yiv4165093533" clear="none" style="" />is returning the
serializationfailure return code for the cases <br class="yiv4165093533" clear="none" style="" />where it is a
concurrenttransaction causing the failure and <br class="yiv4165093533" clear="none" style="" />continuing to return
theduplicate key error for all other cases.<br class="yiv4165093533" clear="none" style="" /><br class="yiv4165093533"
clear="none"style="" />Either I'm not understanding what you wrote above, or you seem to<br class="yiv4165093533"
clear="none"style="" />be arguing for being able to distinguish between errors caused by<br class="yiv4165093533"
clear="none"style="" />concurrent transactions and those which aren't.<div class="qtdSeparateBR"><br /><br /></div><div
class="yiv4165093533yqt9921025714"id="yiv4165093533yqtfd55019"><br class="yiv4165093533" clear="none" style="" />--<br
class="yiv4165093533"clear="none" style="" />Kevin Grittner</div>EDB: http://www.enterprisedb.com<br
class="yiv4165093533"clear="none" style="" />The Enterprise PostgreSQL Company<div class="yiv4165093533yqt9921025714"
id="yiv4165093533yqtfd07562"><brclass="yiv4165093533" clear="none" style="" /><br class="yiv4165093533" clear="none"
style=""/></div></div></div></div></div>

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

29 December 2014, 14:47:42

On Mon, Dec 29, 2014 at 8:03 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner <kgrittn@ymail.com>
>> wrote:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>>> Just for starters, a 40XXX error report will fail to provide the
>>>> duplicated key's value.  This will be a functional regression,
>>>
>>> Not if, as is normally the case, the transaction is retried from
>>> the beginning on a serialization failure.  Either the code will
>>> check for a duplicate (as in the case of the OP on this thread) and
>>> they won't see the error, *or* the the transaction which created
>>> the duplicate key will have committed before the start of the retry
>>> and you will get the duplicate key error.
>>
>> I'm not buying that; that argument assumes duplicate key errors are
>> always 'upsert' driven.  Although OP's code may have checked for
>> duplicates it's perfectly reasonable (and in many cases preferable) to
>> force the transaction to fail and report the error directly back to
>> the application.  The application will then switch on the error code
>> and decide what to do: retry for deadlock/serialization or abort for
>> data integrity error.  IOW, the error handling semantics are
>> fundamentally different and should not be mixed.
>
> I think you might be agreeing with me without realizing it.  Right
> now you get "duplicate key error" even if the duplication is caused
> by a concurrent transaction -- it is not possible to check the
> error code (well, SQLSTATE, technically) to determine whether this
> is fundamentally a serialization problem.  What we're talking about
> is returning the serialization failure return code for the cases
> where it is a concurrent transaction causing the failure and
> continuing to return the duplicate key error for all other cases.
>
> Either I'm not understanding what you wrote above, or you seem to
> be arguing for being able to distinguish between errors caused by
> concurrent transactions and those which aren't.

Well, I'm arguing that duplicate key errors are not serialization
failures unless it's likely the insertion would succeed upon a retry;
a proper insert, not an upsert.  If that's the case with what you're
proposing, then it makes sense to me.  But that's not what it sounds
like...your language suggests AIUI that having the error simply be
caused by another transaction being concurrent would be sufficient to
switch to a serialization error (feel free to correct me if I'm
wrong!).

In other words, the current behavior is:
txn A,B begin
txn A inserts
txn B inserts over A, locks, waits
txn A commits.  B aborts with duplicate key error

Assuming that case is untouched, then we're good!  My long winded
point above is that case must fail with duplicate key error; a
serialization error is suggesting the transaction should be retried
and it shouldn't be...it would simply fail a second time.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

29 December 2014, 15:09:18

Merlin Moncure <mmoncure@gmail.com> wrote:

> Well, I'm arguing that duplicate key errors are not serialization
> failures unless it's likely the insertion would succeed upon a retry;
> a proper insert, not an upsert.  If that's the case with what you're
> proposing, then it makes sense to me.  But that's not what it sounds
> like...your language suggests AIUI that having the error simply be
> caused by another transaction being concurrent would be sufficient to
> switch to a serialization error (feel free to correct me if I'm
> wrong!).
>
> In other words, the current behavior is:
> txn A,B begin
> txn A inserts
> txn B inserts over A, locks, waits
> txn A commits.  B aborts with duplicate key error
>
> Assuming that case is untouched, then we're good!  My long winded
> point above is that case must fail with duplicate key error; a
> serialization error is suggesting the transaction should be retried
> and it shouldn't be...it would simply fail a second time.

What I'm proposing is that for serializable transactions B would
get a serialization failure; otherwise B would get a duplicate key
error.  If the retry of B looks at something in the database to
determine what it's primary key should be it will get a new value
on the retry, since it will be starting after the commit of A.  If
it is using a literal key, not based on something changed by A, it
will get a duplicate key error on the retry, since it will be
starting after the commit of A.

It will either succeed on retry or get an error for a different
reason.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

29 December 2014, 15:31:53

On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>> In other words, the current behavior is:
>> txn A,B begin
>> txn A inserts
>> txn B inserts over A, locks, waits
>> txn A commits.  B aborts with duplicate key error
>
> What I'm proposing is that for serializable transactions B would
> get a serialization failure; otherwise B would get a duplicate key
> error.  If the retry of B looks at something in the database to
> determine what it's primary key should be it will get a new value
> on the retry, since it will be starting after the commit of A.  If
> it is using a literal key, not based on something changed by A, it
> will get a duplicate key error on the retry, since it will be
> starting after the commit of A.
>
> It will either succeed on retry or get an error for a different
> reason.

In that case: we don't agree.  How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

IMO, serialization errors are an implementation artifact and should
not mask well defined errors in SQL under any circumstances (or if
they must, those cases should absolutely be as narrow as possible).

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Greg Stark

Date:

29 December 2014, 15:45:42

On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> In that case: we don't agree.  How come duplicate key errors would be
> reported as serialization failures but not RI errors (for example,
> inserting a record pointing to another record which a concurrent
> transaction deleted)?

The key question is not the type of error but whether the transaction
saw another state previously. That is, if you select to check for
duplicate keys, don't see any, and then try to insert and get a
duplicate key error that would be easy to argue is a serialization
error. The same could be true for an RI check -- if you select some
data and then insert a record that refers to the data you already
verified existed and get an RI failure then you could consider that a
serialization failure.

-- 
greg

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

29 December 2014, 16:17:39

On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> In that case: we don't agree.  How come duplicate key errors would be
>> reported as serialization failures but not RI errors (for example,
>> inserting a record pointing to another record which a concurrent
>> transaction deleted)?
>
> The key question is not the type of error but whether the transaction
> saw another state previously.

[combining replies -- nikita, better not to top-post (FYI)]

How is that relevant?  Serialization errors only exist as a concession
to concurrency and performance.  Again, they should be returned as
sparsely as possible because they provide absolutely (as Tom pointed
out) zero detail to the application.   The precise definition of the
error is up to us, but I'd rather keep it to it's current rather
specific semantics.

On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov <nikita.y.volkov@mail.ru> wrote:
> I believe, the objections expressed in this thread miss a very important
> point of all this: the isolation property (the "I" in ACID) is violated.
>
> Here’s a quote from the Wikipedia article on ACID:
>
> The isolation property ensures that the concurrent execution of transactions
> results in a system state that would be obtained if transactions were
> executed serially, i.e., one after the other.
>
> The original example proves that it's violated. Such behaviour can neither
> be expected by a user, nor is even mentioned anywhere. Instead in the first
> paragraph of the “About” section of the Postgres site it states:
>
> It is fully ACID compliant
>
> Which is basically just a lie, until issues like this one get dealt with.

That's simply untrue: inconvenience != acid violation

Transaction levels provide certain guarantees regarding the state of
the data in the presence of concurrent overlapping operations.   They
do not define the mechanism of failure or how/when those failures
should occur.  To prove your statement, you need to demonstrate how a
transaction left the database in a bad state given concurrent activity
without counting failures.  Postgres can, and does, for example,
return concurrency type errors more aggressively than it needs to for
the 'repeatable read', level.  Point being, this is completely ok as
database implementation is free to understand that, just as it's free
to define precisely how and when it fails given concurrency as long as
those guarantees are provided.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

29 December 2014, 16:54:04

Merlin Moncure <mmoncure@gmail.com> wrote:

> Serialization errors only exist as a concession to concurrency
> and performance.  Again, they should be returned as sparsely as
> possible because they provide absolutely (as Tom pointed
> out) zero detail to the application.

That is false.  They provide an *extremely* valuable piece of
information which is not currently available when you get a
duplicate key error -- whether the error occurred because of a race
condition and will not fail for the same cause if retried.

> The precise definition of the error is up to us, but I'd rather
> keep it to it's current rather specific semantics.

The semantics are so imprecise that Tom argued that we should
document that transactions should be retried from the start when
you get the duplicate key error, since it *might* have been caused
by a race condition.

I'm curious how heavily you use serializable transactions, because
I have trouble believing that those who rely on them as their 
primary (or only) strategy for dealing with race conditions under 
high concurrency would take that position.

As for the fact that RI violations also don't return a
serialization failure when caused by a race with concurrent
transactions, I view that as another weakness in PostgreSQL.  I
don't think there is a problem curing one without curing the other
at the same time.  I have known of people writing their own
triggers to enforce RI rather than defining FKs precisely so that
they can get a serialization failure return code and do automatic
retry if it is caused by a race condition.  That's less practical
to compensate for when it comes to unique indexes or constraints.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Nikita Volkov

Date:

29 December 2014, 16:54:49

I believe, the objections expressed in this thread miss a very important point of all this: the isolation property (the "I" in ACID) is violated.

Here’s a quote from the Wikipedia article on ACID:

The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other.

The original example proves that it's violated. Such behaviour can neither be expected by a user, nor is even mentioned anywhere. Instead in the first paragraph of the “About” section of the Postgres site it states:

It is fully ACID compliant

Which is basically just a lie, until issues like this one get dealt with.

2014-12-29 18:31 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>> In other words, the current behavior is:
>> txn A,B begin
>> txn A inserts
>> txn B inserts over A, locks, waits
>> txn A commits. B aborts with duplicate key error
>
> What I'm proposing is that for serializable transactions B would
> get a serialization failure; otherwise B would get a duplicate key
> error. If the retry of B looks at something in the database to
> determine what it's primary key should be it will get a new value
> on the retry, since it will be starting after the commit of A. If
> it is using a literal key, not based on something changed by A, it
> will get a duplicate key error on the retry, since it will be
> starting after the commit of A.
>
> It will either succeed on retry or get an error for a different
> reason.

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

IMO, serialization errors are an implementation artifact and should
not mask well defined errors in SQL under any circumstances (or if
they must, those cases should absolutely be as narrow as possible).

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Nikita Volkov

Date:

29 December 2014, 16:56:09

> [combining replies -- nikita, better not to top-post (FYI)]

I'm sorry. I don't know what you mean. I just replied to an email.

> To prove your statement, you need to demonstrate how a transaction left the database in a bad state given concurrent activity without counting failures.

1. Transaction A looks up a row by ID 1 and gets an empty result.

2. Concurrent transaction B inserts a row with ID 1.

3. Transaction A goes on with the presumption that a row with ID 1 does not exist, because a transaction is supposed to be isolated and because it has made sure that the row does not exist. With this presumption it confidently inserts a row with ID 1 only to get Postgres report a duplicate key. Wat?

2014-12-29 19:17 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> In that case: we don't agree. How come duplicate key errors would be
>> reported as serialization failures but not RI errors (for example,
>> inserting a record pointing to another record which a concurrent
>> transaction deleted)?
>
> The key question is not the type of error but whether the transaction
> saw another state previously.

[combining replies -- nikita, better not to top-post (FYI)]

How is that relevant? Serialization errors only exist as a concession
to concurrency and performance. Again, they should be returned as
sparsely as possible because they provide absolutely (as Tom pointed
out) zero detail to the application. The precise definition of the
error is up to us, but I'd rather keep it to it's current rather
specific semantics.

On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov <nikita.y.volkov@mail.ru> wrote:
> I believe, the objections expressed in this thread miss a very important
> point of all this: the isolation property (the "I" in ACID) is violated.
>
> Here’s a quote from the Wikipedia article on ACID:
>
> The isolation property ensures that the concurrent execution of transactions
> results in a system state that would be obtained if transactions were
> executed serially, i.e., one after the other.
>
> The original example proves that it's violated. Such behaviour can neither
> be expected by a user, nor is even mentioned anywhere. Instead in the first
> paragraph of the “About” section of the Postgres site it states:
>
> It is fully ACID compliant
>
> Which is basically just a lie, until issues like this one get dealt with.

That's simply untrue: inconvenience != acid violation

Transaction levels provide certain guarantees regarding the state of
the data in the presence of concurrent overlapping operations. They
do not define the mechanism of failure or how/when those failures
should occur. To prove your statement, you need to demonstrate how a
transaction left the database in a bad state given concurrent activity
without counting failures. Postgres can, and does, for example,
return concurrency type errors more aggressively than it needs to for
the 'repeatable read', level. Point being, this is completely ok as
database implementation is free to understand that, just as it's free
to define precisely how and when it fails given concurrency as long as
those guarantees are provided.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

29 December 2014, 20:07:51

On Mon, Dec 29, 2014 at 10:47 AM, Nikita Volkov <nikita.y.volkov@mail.ru> wrote:
>> [combining replies -- nikita, better not to top-post (FYI)]

[combining replied again]

> I'm sorry. I don't know what you mean. I just replied to an email.

http://www.idallen.com/topposting.html

>> To prove your statement, you need to demonstrate how a transaction left
>> the database in a bad state given concurrent activity without counting
>> failures.
>
> 1. Transaction A looks up a row by ID 1 and gets an empty result.
> 2. Concurrent transaction B inserts a row with ID 1.
> 3. Transaction A goes on with the presumption that a row with ID 1 does not
> exist, because a transaction is supposed to be isolated and because it has
> made sure that the row does not exist. With this presumption it confidently
> inserts a row with ID 1 only to get Postgres report a duplicate key. Wat?

Your understanding of isolation is incorrect.   Transaction A does not
go on with anything -- it's guaranteed to fail in this case.  The only
debatable point here is how exactly it fails.  Again, isolation's job
is to protect the data.

On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> The semantics are so imprecise that Tom argued that we should
> document that transactions should be retried from the start when
> you get the duplicate key error, since it *might* have been caused
> by a race condition.

That sounds off to me also.  In terms of a classic uniqueness
constraint (say, a identifying user name), every violation is
technically a race condition -- whether or not the transactions
overlap on time is completely irrelevant.  If the transactions
touching off the error happen to overlap or not is an accident of
timing and irrelevant; a serialization error suggests that the
transaction should be retried when in fact it shouldn't be,
particularly just to get the *actual* error.  What if the transaction
is non-trivial?  Why do we want to bother our users about those
details at all?

Consider the 'idiomatic upsert' as it exists in the documentation (!):
   LOOP       -- first try to update the key       UPDATE db SET b = data WHERE a = key;       IF found THEN
RETURN;      END IF;       --   XXX merlin's note: if any dependent table throws a UV,       --   say, via a trigger,
thiscode will loop endlessly       -- not there, so try to insert the key       -- if someone else inserts the same key
concurrently,      -- we could get a unique-key failure       BEGIN           INSERT INTO db(a,b) VALUES (key, data);
       RETURN;       EXCEPTION WHEN unique_violation THEN           -- do nothing, and loop to try the UPDATE again
 END;   END LOOP;

By changing the error code, for decades worth of dealing with this
problem, you've just converted a server side loop to a full round
trip, and, if the user does not automatically retry serialization
failures, broken his/her code.  It's impossible to fix the round trip
issue, at least provably, because there is no way to know for sure
that the serialization failure is coming from this exact insertion, or
say, a dependent trigger (aside: the idiomatic example aught to be
checking the table name!) such that your loop (either here or from
application) would execute a bazillion times until some other
transaction clears.  OK, this is a mostly academic detail, but the
picture is not so clear as you're saying, I think; you're travelling
at high speed in uncertain waters.

The key point here is that OP issued a SELECT first, and he's chaining
DML decisions to the output of that select. He's expecting that SELECT
to be protected via ACID, but it isn't and can't be unless you're
prepared to predicate lock every row selected.  What he wants is for
the database to bounce his transaction because the select lied to him,
but that can't be done obviously.

> I'm curious how heavily you use serializable transactions, because
> I have trouble believing that those who rely on them as their
> primary (or only) strategy for dealing with race conditions under
> high concurrency would take that position.

I don't use them much, admittedly.  That said, I don't use them as
race condition guards.  I use locks or other techniques to manage the
problem.   I tend to build out applications on top of functions and
the inability to set isolation mode inside a function confounds me
from using anything but 'read committed'.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

29 December 2014, 21:53:36

Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> The semantics are so imprecise that Tom argued that we should
>> document that transactions should be retried from the start when
>> you get the duplicate key error, since it *might* have been caused
>> by a race condition.

> That sounds off to me also. In terms of a classic uniqueness
> constraint (say, a identifying user name), every violation is
> technically a race condition -- whether or not the transactions
> overlap on time is completely irrelevant.

That is completely bogus. The point is that if you return a
serialization failure the transaction should be immediately retried
from the beginning (including, in many cases, acquiring key
values). If the error was caused by concurrent insertion of a
duplicate key where the transaction does *not* acquire the value
within the transaction, *then* you get the duplicate key error on
the retry.

> If the transactions
> touching off the error happen to overlap or not is an accident of
> timing and irrelevant; a serialization error suggests that the
> transaction should be retried when in fact it shouldn't be,
> particularly just to get the *actual* error. What if the transaction
> is non-trivial? Why do we want to bother our users about those
> details at all?

Where serializable transactions are used to manage race conditions
the users typically do not see them. The application code does not
see them. There is normally some sort of framework (possibly using
dependency injection, although not necessarily) which catches these
and retries the transaction from the start without notifying the
user or letting the application software know that it happened.
There is normally some server-side logging so that high volumes of
rollbacks can be identified and fixed. In a real-world situation
where this was used for 100 production databases running millions
of transactions per day, I saw about 10 or 20 serialization
failures per day across the whole set of database servers.

While I have certainly heard of workloads where it didn't work out
that well, Dan Ports found that many common benchmarks perform
about that well. Quoting from the peer-reviewed paper presented in
Istanbul[1]:

| SSI outperforms S2PL for all transaction mixes, and does so by a
| significant margin when the fraction of read-only transactions is
| high. On these workloads, there are more rw-conflicts between
| concurrent transactions, so locking imposes a larger performance
| penalty. (The 100%-read-only workload is a special case; there are
| no lock conflicts under S2PL, and SSI has no overhead because all
| snapshots are safe.) The 150-warehouse configuration (Figure 5b )
| behaves similarly, but the differences are less pronounced: on this
| disk-bound benchmark, CPU overhead is not a factor, and improved
| concurrency has a limited benefit. Here, the performance of SSI
| is indistinguishable from that of SI. Transactions rarely need to
| be retried; in all cases, the serialization failure rate was under
| 0.25%.

> Consider the 'idiomatic upsert' as it exists in the documentation (!):

That documentation should probably be updated to indicate which
isolation levels need that code. If you are relying on
serializable transactions that dance is unnecessary and pointless.
The rule when relying on serializable transactions is that you
write the code to behave correctly if the transaction executing it
is running by itself. Period. No special handling for race
conditions. Detecting race conditions is the job of the database
engine and retrying affected transactions is the job of the
framework. Absolutely nobody who understands serializable
transactions would use that idiom inside of serializable
transactions.

> By changing the error code, for decades worth of dealing with this
> problem, you've just converted a server side loop to a full round
> trip, and, if the user does not automatically retry serialization
> failures, broken his/her code.

If they are not automatically retrying on serialization failures,
they should probably not be using serializable transactions. That
is rather the point. No need for table locks. No need for SELECT
FOR UPDATE. No need to special-case for concurrent transactions.
"Just do it." Let the framework retry as needed.

I have no problem with there being people who choose not to use
this approach. What I'm asking is that they not insist on
PostgreSQL being needlessly crippled for those who *do* use it this
way.

> It's impossible to fix the round trip
> issue, at least provably, because there is no way to know for sure
> that the serialization failure is coming from this exact insertion, or
> say, a dependent trigger (aside: the idiomatic example aught to be
> checking the table name!) such that your loop (either here or from
> application) would execute a bazillion times until some other
> transaction clears.

Until the inserting transaction commits, it would block the other
insertion attempt, so the above is simply wrong. We do our best in
our serializable implementation to avoid a situation where a
transaction retry can hit the same set of conflicting transactions.
Doing so is *extremely* rare except in the case of a prepared
transaction which is left hanging for a long time. There's no
getting around the possibility of some serializable transaction
head-banging in that case, I'm afraid; but otherwise, the above is
not generally an issue. Now, Andres showed that using serializable
transactions to manage access to a table that is being used as a
queue is a pessimal case; I can see using some form of explicit
locking and a lower isolation level for accessing a queue.

> OK, this is a mostly academic detail, but the
> picture is not so clear as you're saying, I think; you're travelling
> at high speed in uncertain waters.

Hardly. Alan Fekete has published a series of papers on the
subject, over more than a decade (starting, as far as I'm aware,
with one funded by the National Science Foundation). This work was
building on work of Atul Adya of MIT. One of the findings I
remember from a paper where Alan Fekete was an author showed that
adding sufficient explicit locks to software using snapshot
isolation to make is safe *in the general case* dragged performance
and concurrency down to the levels of S2PL. (I'm sorry I don't
have the citation for that -- there were so many papers he authored
or co-authored on this topic it would probably take hours to find
that one.) Alan Fekete was one of the authors on the paper which
got me started looking at this technique[1]. (That paper won "best
paper" at ACM SIGMOD 2008, BTW.) We incorporated enhancements to
the paper which were in Michael Cahill's doctoral work[2], as well
as further enhancements that Dan Ports of MIT and I came up with
during development (which were subsequently reviewed by Michael
Cahill and Alan Fekete).  Practical aspects of the implementation
incorporate suggestions (and sometimes code) from others in the
PostgreSQL community -- see Section 10 of [3].

> The key point here is that OP issued a SELECT first, and he's chaining
> DML decisions to the output of that select. He's expecting that SELECT
> to be protected via ACID, but it isn't and can't be unless you're
> prepared to predicate lock every row selected.

Yes, that is what happens when you run at the serializable
transaction isolation level.

> What he wants is for the database to bounce his transaction
> because the select lied to him, but that can't be done obviously.

It can and is done. Please review the cited papers and/or look at
the Wiki page of practical examples I have created:

https://wiki.postgresql.org/wiki/SSI

>> I'm curious how heavily you use serializable transactions, because
>> I have trouble believing that those who rely on them as their
>> primary (or only) strategy for dealing with race conditions under
>> high concurrency would take that position.
>
> I don't use them much, admittedly. That said, I don't use them as
> race condition guards.

Well, that is their entire and only purpose, and they are of no use
at all unless used consistently for all transactions (or all but
carefully considered exceptional circumstances, like queues).

> I use locks or other techniques to manage the problem.

I think most people on this list do. In my experience, it is only
the large shops with tens of thousands of transaction types and
dozens of programmers modifying and adding new transaction types
while other run ad hoc queries that the serializable transactions
become important. My experiences on this list have show me that
unless you have worked in a shop where the above are true and they
have therefore moved to the "write each transaction so that it does
the right thing when run by itself and our framework will cover all
concurrency issues with serializable transactions and retries on
serialization failures" you will not really understand why
consistent return of a serialization failure indication for
problems caused by concurrent transactions is so important.

> I tend to build out applications on top of functions and the
> inability to set isolation mode inside a function confounds me
> from using anything but 'read committed'.

Hey, no problem -- just set default_transaction_isolation =
'serializable' in your postgresql.conf file and never override
isolation level and you'll never have to use an explicit table lock
or SELECT FOR UPDATE again. While you're at it, set
default_transaction_read_only = on and only override it for
transactions that (might) need to update and you'll have dodged the
worst of the performance problems from serializable transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008.
* Serializable isolation for snapshot databases.
* In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD
* international conference on Management of data,
* pages 729-738, New York, NY, USA. ACM.
* http://doi.acm.org/10.1145/1376616.1376690

[2] Michael James Cahill. 2009.
* Serializable Isolation for Snapshot Databases.
* Sydney Digital Theses.
* University of Sydney, School of Information Technologies.
* http://hdl.handle.net/2123/5353

[3] Dan R. K. Ports and Kevin Grittner. 2012.
* Serializable Snapshot Isolation in PostgreSQL.
* Proceedings of the VLDB Endowment, Vol. 5, No. 12.
* The 38th International Conference on Very Large Data Bases,
* August 27th - 31st 2012, Istanbul, Turkey.
* http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

29 December 2014, 21:59:54

[errata]


Kevin Grittner <kgrittn@ymail.com> wrote:

> Quoting from the peer-reviewed paper presented in Istanbul[1]:

That should have been [3], not [1].

Re: BUG #12330: ACID is broken for unique constraints

From

Jim Nasby

Date:

30 December 2014, 00:28:55

On 12/29/14, 10:53 AM, Kevin Grittner wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> Serialization errors only exist as a concession to concurrency
>> and performance.  Again, they should be returned as sparsely as
>> possible because they provide absolutely (as Tom pointed
>> out) zero detail to the application.
>
> That is false.  They provide an *extremely* valuable piece of
> information which is not currently available when you get a
> duplicate key error -- whether the error occurred because of a race
> condition and will not fail for the same cause if retried.

As for missing details like the duplicated key value, is there a reasonable way to add that as an errdetail() to a
serializationfailure error?

We do still have to be careful here though; you could still have code using our documented upsert methodology inside a
serializedtransaction. For example, via a 3rd party extension that can't assume you're using serialized transactions.
Wouldit still be OK to treat that as a serialization failure and bubble that all the way back to the application?

As part of this, we should probably modify our upsert example so it takes transaction_isolation into account...

> As for the fact that RI violations also don't return a
> serialization failure when caused by a race with concurrent
> transactions, I view that as another weakness in PostgreSQL.  I
> don't think there is a problem curing one without curing the other
> at the same time.  I have known of people writing their own
> triggers to enforce RI rather than defining FKs precisely so that
> they can get a serialization failure return code and do automatic
> retry if it is caused by a race condition.  That's less practical
> to compensate for when it comes to unique indexes or constraints.

Wow, that's horrible. :(
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: BUG #12330: ACID is broken for unique constraints

From

Kevin Grittner

Date:

30 December 2014, 03:01:03

Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
> On 12/29/14, 10:53 AM, Kevin Grittner wrote:
>> Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> Serialization errors only exist as a concession to concurrency
>>> and performance.  Again, they should be returned as sparsely as
>>> possible because they provide absolutely (as Tom pointed
>>> out) zero detail to the application.
>>
>> That is false.  They provide an *extremely* valuable piece of
>> information which is not currently available when you get a
>> duplicate key error -- whether the error occurred because of a
>> race condition and will not fail for the same cause if retried.
>
> As for missing details like the duplicated key value, is there a
> reasonable way to add that as an errdetail() to a serialization
> failure error?

Sure.  What we've been discussing is the case where a unique index
is checked for a duplicate with visibility which would cause it to
fail; whether, at transaction isolation level of serializable, to
check to see if the committed transaction which created the
existing entry overlaps the entry checking for duplicates, and use
a different SQLSTATE.  Anything available for display in the detail
currently would still be available.  We give different detail
messages for different types of serialization failures; it would
seem a little silly not to mention the table and the duplicate
values for this type of serialization failure.  My guess is that in
production this would be completely ignored by the software, but
logging it might be helpful in tuning.

> We do still have to be careful here though; you could still have
> code using our documented upsert methodology inside a serialized
> transaction. For example, via a 3rd party extension that can't
> assume you're using serialized transactions. Would it still be OK
> to treat that as a serialization failure and bubble that all the
> way back to the application?

If the current upsert function example were used, unmodified, by
serializable transactions with the suggested changes, it would not
loop back for another try within the function, but generate a
serialization failure for the transaction as a whole.  If there was
a framework watching for that and retrying transactions when it is
seen, the transaction would start over and see the new state and
behave correctly.  Without the change I'm proposing, I bet it would
loop endlessly, until interrupted.  [tries it]  Yep, there it is.
All the hand-wringing about how we might break existing
applications is entirely backwards.  Under either repeatable read 
or serializable that technique goes to 100% of one CPU until you 
cancel it -- completely broken.  If it generated a serialization 
failure the code in our docs would throw the serialization failure 
out to the client, which (in most environments using serializable 
transactions, or even repeatable read) would retry the transaction 
from the beginning, with much less pain.  It makes me wonder 
whether it would be better to generate the serialization failure 
for both repeatable read and serializable, since those are the 
cases where the sample code in our docs generates the endless loop.

> As part of this, we should probably modify our upsert example so
> it takes transaction_isolation into account...

Probably.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #12330: ACID is broken for unique constraints

From

Merlin Moncure

Date:

30 December 2014, 16:15:22

On Mon, Dec 29, 2014 at 3:53 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> I tend to build out applications on top of functions and the
>> inability to set isolation mode inside a function confounds me
>> from using anything but 'read committed'.
>
> Hey, no problem -- just set default_transaction_isolation =
> 'serializable' in your postgresql.conf file and never override
> isolation level and you'll never have to use an explicit table lock
> or SELECT FOR UPDATE again. While you're at it, set
> default_transaction_read_only = on and only override it for
> transactions that (might) need to update and you'll have dodged the
> worst of the performance problems from serializable transactions.

ok...wow.  That's pretty convincing, honestly.  Your objective is
sensible and clear.

So basically there are/were two concrete concerns:  compatibility
break and loss of reported detail.  You've convinced me that
principled serialization code shouldn't be impacted negatively by
changing the returned sqlstate (although, perhaps, we're being a bit
too cavalier in terms of the amount of unprincipled code out there --
particularly looping pl/pgsql exception handlers).

Ideally, you'd sneak more detail about the specifics of the error into
errdetail or someplace as Jim is suggesting.  That'd address the other
point.  So, given those things, I'm mollified, FWIW.

As an aside, the main reason I don't run with serializable isn't
performance paranoia or OCD management of locking semantics in the
90%+ of code that doesn't need that treatment; it's because I build
code (well, chaining DML and such) in the server, always, and have no
way of managing isolation level inside of functions because by the
time the function body is invoked the snapshot is already generated.
I can make read committed transactions work with appropriate locking
but there is no way to downgrade serializable transactions in flight.
IOW, lack of proper stored procedures is what's holding me back.

merlin

Re: BUG #12330: ACID is broken for unique constraints

From

Greg Stark

Date:

30 December 2014, 17:25:06

On Mon, Dec 29, 2014 at 4:17 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Serialization errors only exist as a concession
> to concurrency and performance.  Again, they should be returned as
> sparsely as possible

I think this is fuzzy thinking. Serialization *errors* themselves are
a concession to concurrency and performance but the actual signaling
of the errors is just a consequence that the serialization failure
occurred. If you can find a way to avoid the serialization failure
then yay but you can't just not report it and call that a win.

The point of returning a serialization failure is to report when a
transaction sees something that can't be possible in a serialized
execution. It's not that "retrying might make the error go away" it's
that the error shouldn't have been possible in a serialized execution
so the code shouldn't have to be prepared for it.

So if you get a unique violation or an RI violation and the
transaction didn't previously verify that the constraint wouldn't be
violated then it's perfectly ok to return a constraint violation
error. It doesn't matter whether retrying might avoid the error since
there was some serialized order in which the constraint was violated.

The case at issue is things like upsert where the code already
verified that no violation should occur. Upsert is a typical case but
it would be equally valid if it's an RI constraint and the transaction
verified that the RI constraint is satisified before inserting or
updating. If a concurrent transaction breaks the constraint for this
insert and the insert gets a constraint violation then it ought to get
a serialization failure since the constraint failure can't occur in a
serialized execution order.

But shouldn't the Read-Write dependency graph already be picking this
up? IF you've previously read a row and someone updates it and then
you try to lock it for updates it should be considering this a
serialization failure.

-- 
greg