Thread: data modifying WITH seems to drop rows in cascading updates -- bug?

data modifying WITH seems to drop rows in cascading updates -- bug?

From

Merlin Moncure

Date:

23 August 2019, 15:36:01

Hackers,

Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear.  Why does the query only return one row?

postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
 id
────
  2
(1 row)

postgres=# select version();
                                                 version
─────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

merlin

Re: data modifying WITH seems to drop rows in cascading updates -- bug?

From

Tom Lane

Date:

23 August 2019, 17:54:18

Merlin Moncure <mmoncure@gmail.com> writes:
> Trying to figure out if this is undefined behavior of a bug. It's
> confusing, and I'm aware of certain oddities in the fringes of the
> data modifying with queries where the query dependencies are not
> really clear.  Why does the query only return one row?

> postgres=# create table foo(id int);
> CREATE TABLE
> postgres=# insert into foo values(1);
> INSERT 0 1
> postgres=# with a as (update foo set id = id + 1 returning *), b
> as(update foo set id = id + 1 returning * ) select * from a union all
> select id from b;
>  id
> ────
>   2
> (1 row)

FWIW, I think it's intentional.  The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self.  It's undefined
only to the extent that it's not completely clear which one gets
there first.  In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.

Note that the fine manual (sec. 7.8.2) says

    Trying to update the same row twice in a single statement is not
    supported. Only one of the modifications takes place, but it is not
    easy (and sometimes not possible) to reliably predict which one. This
    also applies to deleting a row that was already updated in the same
    statement: only the update is performed. Therefore you should
    generally avoid trying to modify a single row twice in a single
    statement. In particular avoid writing WITH sub-statements that could
    affect the same rows changed by the main statement or a sibling
    sub-statement. The effects of such a statement will not be
    predictable.

            regards, tom lane

Re: data modifying WITH seems to drop rows in cascading updates -- bug?

From

Merlin Moncure

Date:

24 August 2019, 03:02:03

On Friday, August 23, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:
> Trying to figure out if this is undefined behavior of a bug. It's
> confusing, and I'm aware of certain oddities in the fringes of the
> data modifying with queries where the query dependencies are not
> really clear. Why does the query only return one row?

> postgres=# create table foo(id int);
> CREATE TABLE
> postgres=# insert into foo values(1);
> INSERT 0 1
> postgres=# with a as (update foo set id = id + 1 returning *), b
> as(update foo set id = id + 1 returning * ) select * from a union all
> select id from b;
> id
> ────
> 2
> (1 row)

FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.

Note that the fine manual (sec. 7.8.2) says

Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.

Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.

merlin

Re: data modifying WITH seems to drop rows in cascading updates -- bug?

From

Marko Tiikkaja

Date:

24 August 2019, 07:08:42

On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:

Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.

Nah; it's defined, yet unpredictable behavior. Still worth half the points, I guess.

-m

data modifying WITH seems to drop rows in cascading updates -- bug?

From

Merlin Moncure

Date:

24 August 2019, 08:26:36

On Saturday, August 24, 2019, Marko Tiikkaja <marko@joh.to> wrote:

On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:
Right. Shame on me for not checking the docs before posting. Simply stated, this is undefined behavior.

Nah; it's defined, yet unpredictable behavior. Still worth half the points, I guess.

Lack of prediction on outputs is a class of undefined behavior. If a program is left in unknown state (including results returned to caller) this meets the classic definition of undefined. I realize this is purely semantic pedantry but in language design these distinctions are important and fun to discuss.

Undefined behavior is good. It allows for future refinements of the model and frees the caller from expectations against the result; if some fast, clever way of managing these cases is somehow innovated the behavior can be defined later without breaking compatibility.

FYI: this thread exists as it was turned up during the process of teaching yet another convert the cooler stuff with postgres and he's now getting into wCTE. I visibly saw the light bulb go off...he's ours now. I'm a fan of your work

merlin