Thread: more select-for-update questions

more select-for-update questions

From

"Ed L."

Date:

06 August 2007, 16:09:58

I ran a simple select-for-update test on 8.1.2 and was curious as
to why the semantics are what they are.  Specifically, when you
have multiple select-for-update-limit queries on the same rows,
why are rows selected by the blocked query before knowing if
some of those rows will be removed/eliminated by a preceding
select-for-update-limit?

Here's an example.  I created this table:

create table foo(
    id serial,
    done boolean not null default false,
    msg varchar);

Then I inserted some data:

select * from foo
 id | done |             msg
----+------+------------------------------
  1 | f    | Mon Aug  6 12:09:11 MDT 2007
  2 | f    | Mon Aug  6 12:09:12 MDT 2007
  3 | f    | Mon Aug  6 12:09:13 MDT 2007
  4 | f    | Mon Aug  6 12:09:14 MDT 2007
  5 | f    | Mon Aug  6 12:09:15 MDT 2007
(5 rows)

Then in transaction A,

begin;

select * from foo where not done for update limit 3;
 id | done |             msg
----+------+------------------------------
  1 | f    | Mon Aug  6 12:09:11 MDT 2007
  2 | f    | Mon Aug  6 12:09:12 MDT 2007
  3 | f    | Mon Aug  6 12:09:13 MDT 2007
(3 rows)

update foo set done = 't' where id < 4;
UPDATE 3

select * from foo;
 id | done |             msg
----+------+------------------------------
  4 | f    | Mon Aug  6 12:09:14 MDT 2007
  5 | f    | Mon Aug  6 12:09:15 MDT 2007
  1 | t    | Mon Aug  6 12:09:11 MDT 2007
  2 | t    | Mon Aug  6 12:09:12 MDT 2007
  3 | t    | Mon Aug  6 12:09:13 MDT 2007
(5 rows)

Then in transaction B, before committing transaction A,

begin;

select * from foo where not done for update limit 3;
(this blocks transaction B awaiting transaction A commit)

Then, just after commit in transaction A, I see the
previously-blocked query in transaction B returns:

select * from foo where not done for update limit 3;
 id | done | msg
----+------+-----
(0 rows)

It returns zero rows when I expected it to return two (id 4 and
5).  If I immediately run the same query again in transaction B,
I see what I expected to see in the preceding query:

 select * from foo where not done for update limit 3;
 id | done |             msg
----+------+------------------------------
  4 | f    | Mon Aug  6 12:09:14 MDT 2007
  5 | f    | Mon Aug  6 12:09:15 MDT 2007
(2 rows)

So, B is selecting rows for update and applying the limit prior
to knowing which rows will be excluded by A's updates.  I know
that is well-documented behavior.  It just seems pretty
unintuitive.  I'm just wondering if there is some good reason
for it.


TIA.
Ed

Re: more select-for-update questions

From

Alvaro Herrera

Date:

14 August 2007, 21:52:11

Ed L. wrote:
> I ran a simple select-for-update test on 8.1.2 and was curious as
> to why the semantics are what they are.  Specifically, when you
> have multiple select-for-update-limit queries on the same rows,
> why are rows selected by the blocked query before knowing if
> some of those rows will be removed/eliminated by a preceding
> select-for-update-limit?

This is how it is just because of "historical reasons", i.e. it's a
known misfeature that no one has bothered to fix.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: more select-for-update questions

From

Tom Lane

Date:

14 August 2007, 22:58:09

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Ed L. wrote:
>> I ran a simple select-for-update test on 8.1.2 and was curious as
>> to why the semantics are what they are.  Specifically, when you
>> have multiple select-for-update-limit queries on the same rows,
>> why are rows selected by the blocked query before knowing if
>> some of those rows will be removed/eliminated by a preceding
>> select-for-update-limit?

> This is how it is just because of "historical reasons", i.e. it's a
> known misfeature that no one has bothered to fix.

Not so much not bothered, as that the backwards-compatibility issues
seem a bit scary.  If we change this behavior we could easily break
more apps than we help.

The implementation reason why it's like that is that FOR UPDATE
filtering is handled in the top-level executor code (execMain.c)
while LIMIT is a plan node type.  To change it we'd need to make the
FOR UPDATE filter into a plan node type that we could put underneath
LIMIT instead of atop it.  I occasionally think about doing that as a
means for supporting FOR UPDATE in sub-SELECTs, but the real issue
with that whole idea is that we don't promise a darn thing about how
many times a join input relation will be read or how far it will be
read or in what order.  So the semantic effect of FOR UPDATE in a
sub-SELECT, in terms of exactly which rows will get locked, seems
impossible to state precisely.  Or to put it more plainly: right
now, we lock only rows that we are about to return to the client.
So "which rows get locked" is exactly as well-defined as the query
as a whole is.  As soon as we push the locking further down into
the plan, there's a bunch of unspecified implementation behaviors
that will affect which rows get locked, and it's very likely that
some will get locked that have nothing to do with any row that's
returned to the client.

            regards, tom lane