Thread: data modifying WITH seems to drop rows in cascading updates -- bug?
Hackers, Trying to figure out if this is undefined behavior of a bug. It's confusing, and I'm aware of certain oddities in the fringes of the data modifying with queries where the query dependencies are not really clear. Why does the query only return one row? postgres=# create table foo(id int); CREATE TABLE postgres=# insert into foo values(1); INSERT 0 1 postgres=# with a as (update foo set id = id + 1 returning *), b as(update foo set id = id + 1 returning * ) select * from a union all select id from b; id ──── 2 (1 row) postgres=# select version(); version ───────────────────────────────────────────────────────────────────────────────────────────────────────── PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit (1 row) merlin
Merlin Moncure <mmoncure@gmail.com> writes: > Trying to figure out if this is undefined behavior of a bug. It's > confusing, and I'm aware of certain oddities in the fringes of the > data modifying with queries where the query dependencies are not > really clear. Why does the query only return one row? > postgres=# create table foo(id int); > CREATE TABLE > postgres=# insert into foo values(1); > INSERT 0 1 > postgres=# with a as (update foo set id = id + 1 returning *), b > as(update foo set id = id + 1 returning * ) select * from a union all > select id from b; > id > ──── > 2 > (1 row) FWIW, I think it's intentional. The two UPDATEs execute against the same snapshot, so only one of them can update the row --- the other one is going to see it as already-updated-by-self. It's undefined only to the extent that it's not completely clear which one gets there first. In this formulation of the outer query, I think it's pretty safe to assume that "a" will get there first, but if you'd joined "a" and "b" in some other fashion, conceivably "b" would. Note that the fine manual (sec. 7.8.2) says Trying to update the same row twice in a single statement is not supported. Only one of the modifications takes place, but it is not easy (and sometimes not possible) to reliably predict which one. This also applies to deleting a row that was already updated in the same statement: only the update is performed. Therefore you should generally avoid trying to modify a single row twice in a single statement. In particular avoid writing WITH sub-statements that could affect the same rows changed by the main statement or a sibling sub-statement. The effects of such a statement will not be predictable. regards, tom lane
On Friday, August 23, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
> Trying to figure out if this is undefined behavior of a bug. It's
> confusing, and I'm aware of certain oddities in the fringes of the
> data modifying with queries where the query dependencies are not
> really clear. Why does the query only return one row?
> postgres=# create table foo(id int);
> CREATE TABLE
> postgres=# insert into foo values(1);
> INSERT 0 1
> postgres=# with a as (update foo set id = id + 1 returning *), b
> as(update foo set id = id + 1 returning * ) select * from a union all
> select id from b;
> id
> ────
> 2
> (1 row)
FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.
Note that the fine manual (sec. 7.8.2) says
Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.
Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.
merlin
On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:
Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.
Nah; it's defined, yet unpredictable behavior. Still worth half the points, I guess.
-m
On Saturday, August 24, 2019, Marko Tiikkaja <marko@joh.to> wrote:
On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.Nah; it's defined, yet unpredictable behavior. Still worth half the points, I guess.
Lack of prediction on outputs is a class of undefined behavior. If a program is left in unknown state (including results returned to caller) this meets the classic definition of undefined. I realize this is purely semantic pedantry but in language design these distinctions are important and fun to discuss.
Undefined behavior is good. It allows for future refinements of the model and frees the caller from expectations against the result; if some fast, clever way of managing these cases is somehow innovated the behavior can be defined later without breaking compatibility.
FYI: this thread exists as it was turned up during the process of teaching yet another convert the cooler stuff with postgres and he's now getting into wCTE. I visibly saw the light bulb go off...he's ours now. I'm a fan of your work
merlin