Thread: BUG #4925: "select ... for update" doesn't affect rows from sub-query
The following bug has been logged online: Bug reference: 4925 Logged by: Steve Caligo Email address: steve.caligo@ctie.etat.lu PostgreSQL version: 8.3.7 and 8.4.0 Operating system: Archlinux and Gentoo 8.3.7, Gentoo 8.4.0 Description: "select ... for update" doesn't affect rows from sub-query Details: While trying to guarantee data consistency when doing concurrent processing, I stumbled upon your cautions mentionned in your documentation (especially "limit" ... "for update"): http://www.postgresql.org/docs/8.4/static/sql-select.html I tried working around this limitation and the statement on the same page seemed promising to me: "If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query." But unfortunately the latter statement doesn't seem to be true and the subquery isn't protected by row locks, as the following examples show. 1) INITIAL SITUATION Create a simple table with some data. No constraints, no indexes, just the bare minimum: create table test ( id integer, name varchar(10), c integer ); insert into test values (1, 'test1', 0), (2, 'test2', 0), (3, 'test3', 0), (4, 'test4', 0), (5, 'test5', 0) ; 2) FIRST TRY, USING "UPDATE WHERE ID = ( SELECT )" The goal is to have two clients set their unique ID to a single/different row from the table. First, using "limit" in a slightly different way: 1=> begin transaction; 2=> begin transaction; 1=> update test set c = 1 where id = ( select id from test where c = 0 order by id limit 1 ) ; -- updates row id=1 2=> update test set c = 2 where id = ( select id from test where c = 0 order by id limit 1 ) ; -- forced to wait on lock 1=> commit; -- client #2 continues 1=> select * from test where id = 1; id | name | c ----+-------+--- 1 | test1 | 1 (1 row) 2=> commit; 2=> select * from test where id = 1; id | name | c ----+-------+--- 1 | test1 | 2 (1 row) 1=> select * from test where id = 1; id | name | c ----+-------+--- 1 | test1 | 2 (1 row) Conclusion: didn't work. Probably because the subquery is executed before the update and not affected by row locking. Expected behaviour: one client update one row to c=1 and the other client updates a different row to c=2. 3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR UPDATE" AND AVOIDING THE "LIMIT" 1=> begin transaction; 2=> begin transaction; 1=> select id from test where id = ( select min(id) from test where c = 0 ) for update; id ---- 2 (1 row) 2=> select id from test where id = ( select min(id) from test where c = 0 ) for update; -- forced to wait on lock 1=> update test set c = 1 where id = 2; 1=> commit; -- client #2 continues: 2=> -- client #2 outputs: id ---- 2 (1 row) 2=> select * from test where id = 2; id | name | c ----+-------+--- 2 | test2 | 1 (1 row) 2=> -- now this isn't what we initially asked for, let's just repeat the query once more: 2=> select id from test where id = ( select min(id) from test where c = 0 ) for update; id ---- 3 (1 row) Conclusion: didn't work. The situation one ends up in is one that contradicts your above statement, but also seemingly violates the "I" in ACID. In 3), client #2 is clearly affected by the actions of client #1. While serialized transactions or full table locks would avoid this race condition, it either requires large changes in the application or impacts performance during contention. Adding an additional "c = 0" to the main query of 3) of course suppresses the row from the concurrent update, but it shouldn't have been returned with a value of "c = 1" because the transaction #2 started prior to the update statement of #1.
On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve.caligo@ctie.etat.lu> w= rote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A04925 > Logged by: =A0 =A0 =A0 =A0 =A0Steve Caligo > Email address: =A0 =A0 =A0steve.caligo@ctie.etat.lu > PostgreSQL version: 8.3.7 and 8.4.0 > Operating system: =A0 Archlinux and Gentoo 8.3.7, Gentoo 8.4.0 > Description: =A0 =A0 =A0 =A0"select ... for update" doesn't affect rows f= rom > sub-query > Details: > > While trying to guarantee data consistency when doing concurrent processi= ng, > I stumbled upon your cautions mentionned in your documentation (especially > "limit" ... "for update"): > http://www.postgresql.org/docs/8.4/static/sql-select.html > > I tried working around this limitation and the statement on the same page > seemed promising to me: > "If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects > all tables used in the view or sub-query." > > But unfortunately the latter statement doesn't seem to be true and the > subquery isn't protected by row locks, as the following examples show. > > 1) INITIAL SITUATION > > Create a simple table with some data. No constraints, no indexes, just the > bare minimum: > > create table test ( > =A0 =A0 =A0 =A0id integer, > =A0 =A0 =A0 =A0name varchar(10), > =A0 =A0 =A0 =A0c integer > ); > > insert into test values > =A0 =A0 =A0 =A0(1, 'test1', 0), > =A0 =A0 =A0 =A0(2, 'test2', 0), > =A0 =A0 =A0 =A0(3, 'test3', 0), > =A0 =A0 =A0 =A0(4, 'test4', 0), > =A0 =A0 =A0 =A0(5, 'test5', 0) > ; > > > 2) FIRST TRY, USING "UPDATE WHERE ID =3D ( SELECT )" > The goal is to have two clients set their unique ID to a single/different > row from the table. First, using "limit" in a slightly different way: > > 1=3D> begin transaction; > 2=3D> begin transaction; > > 1=3D> update test set c =3D 1 where id =3D ( > =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1 > ) > ; -- updates row id=3D1 > 2=3D> update test set c =3D 2 where id =3D ( > =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1 > ) > ; -- forced to wait on lock > > 1=3D> commit; -- client #2 continues > 1=3D> select * from test where id =3D 1; > =A0id | name =A0| c > ----+-------+--- > =A01 | test1 | 1 > (1 row) > > 2=3D> commit; > 2=3D> select * from test where id =3D 1; > =A0id | name =A0| c > ----+-------+--- > =A01 | test1 | 2 > (1 row) > > 1=3D> select * from test where id =3D 1; > =A0id | name =A0| c > ----+-------+--- > =A01 | test1 | 2 > (1 row) > > > Conclusion: didn't work. > Probably because the subquery is executed before the update and not affec= ted > by row locking. > > Expected behaviour: one client update one row to c=3D1 and the other clie= nt > updates a different row to c=3D2. This is pretty clearly NOT the situation described in the documentation. There is no FOR UPDATE or FOR SHARE anywhere in this query. You could argue that we should treat an UPDATE statement as applying an implicit FOR UPDATE to any subqueries found therein, but that has nothing to do with whether the current behavior matches the documentation; it's a discussion of whether the current behavior is good or bad. > 3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR > UPDATE" AND AVOIDING THE "LIMIT" > > 1=3D> begin transaction; > 2=3D> begin transaction; > 1=3D> select id from test where id =3D ( > =A0 =A0 =A0 =A0select min(id) from test where c =3D 0 > ) for update; > =A0id > ---- > =A02 > (1 row) > > 2=3D> select id from test where id =3D ( > =A0 =A0 =A0 =A0select min(id) from test where c =3D 0 > ) for update; -- forced to wait on lock > > 1=3D> update test set c =3D 1 where id =3D 2; > 1=3D> commit; -- client #2 continues: > 2=3D> -- client #2 outputs: > =A0id > ---- > =A02 > (1 row) > > 2=3D> select * from test where id =3D 2; > =A0id | name =A0| c > ----+-------+--- > =A02 | test2 | 1 > (1 row) > 2=3D> -- now this isn't what we initially asked for, let's just repeat the > query once more: > > 2=3D> select id from test where id =3D ( select min(id) from test where c= =3D 0 ) > for update; > =A0id > ---- > =A03 > (1 row) > > > Conclusion: didn't work. > The situation one ends up in is one that contradicts your above statement, > but also seemingly violates the "I" in ACID. In 3), client #2 is clearly > affected by the actions of client #1. While serialized transactions or fu= ll > table locks would avoid this race condition, it either requires large > changes in the application or impacts performance during contention. This is pretty weird behavior, and I am among those who think it sucks. But it is documented. http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALI= ZABILITY An interesting fact is that if you stick another "for update" into the subquery here, the command will fail utterly, with the following error message: ERROR: SELECT FOR UPDATE/SHARE is not allowed with aggregate functions > Adding an additional "c =3D 0" to the main query of 3) of course suppress= es > the row from the concurrent update, but it shouldn't have been returned w= ith > a value of "c =3D 1" because the transaction #2 started prior to the upda= te > statement of #1. ...Robert
On Thu, Jul 30, 2009 at 1:40 AM, Steve Caligo<Steve.Caligo@ctie.etat.lu> wr= ote: > Allowing FOR UPDATE in sub-queries has been rejected in this same thread, > you've also mentioned it your previous reply. > http://archives.postgresql.org/pgsql-bugs/2004-10/msg00150.php Um, I didn't write that email. That was Tom Lane. >> This is pretty weird behavior, and I am among those who think it >> sucks. =A0But it is documented. >> >> http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERI= ALIZABILITY > > This behavior may be consistent with the transaction isolation levels > PostgreSQL provides (read committed/serializable, while one would need > true repeatable read in this case), it is a huge drawback when working > with the database, as one has to think about potential collateral damage > with every single SELECT...FOR UPDATE / UPDATE query one writes, > especially the complex ones. > > >> An interesting fact is that if you stick another "for update" into the >> subquery here, the command will fail utterly, with the following error >> message: >> >> ERROR: =A0SELECT FOR UPDATE/SHARE is not allowed with aggregate functions > > See my second link above, it works as designed. > > The document states that the FOR UPDATE is applied to the view or > sub-query, which I assume means: "applied to a SELECT containing a > sub-query or view". As such, I'm expecting a single statement to be an > atomic operation, i.e. always works on the same data snapshot, independent > of the transaction isolation level. > > This atomicity has to be enforced through a full table lock (which is > often the easiest to implement, but also the most expensive > efficiency-wise) or serialization. > > Side note: a cross-database test showed that Oracle, DB2 nor MySQL (with > InnoDB storage engine) run the query as true repeatable read, whether one > makes use of the "skip locked rows" (when available) or not. One always > ends up with two different rows being updated. Sure. I mean, I understand your frustration here, but this started out as a documentation complaint. I think the current behavior is documented reasonably OK; the problem is that the behavior is pretty weird. I'd be all in favor of fixing it, but I'm not sure what would be involved in that or what the trade-offs would be. I suspect if it were easy it would have been done long ago; you're not the first person to complain about it. ...Robert
> On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve.caligo@ctie.etat.lu> > wrote: >> 2) FIRST TRY, USING "UPDATE WHERE ID = ( SELECT )" >> The goal is to have two clients set their unique ID to a >> single/different >> row from the table. First, using "limit" in a slightly different way: > This is pretty clearly NOT the situation described in the > documentation. There is no FOR UPDATE or FOR SHARE anywhere in this > query. You could argue that we should treat an UPDATE statement as > applying an implicit FOR UPDATE to any subqueries found therein, but > that has nothing to do with whether the current behavior matches the > documentation; it's a discussion of whether the current behavior is > good or bad. Should there really be a difference between doing an UPDATE or just requesting rows to be locked for a later modification (update or deletion, whatever) through SELECT ... FOR UPDATE? I'm aware that there's no FOR UPDATE in that query, as its direct use is discouraged in the documentation (SELECT ... LIMIT ... FOR UPDATE) and reported as "odd behavior" previously as well: http://archives.postgresql.org/pgsql-bugs/2004-10/msg00138.php Yes, I'm considering this behavior bad, as I'd expect the UPDATE statement to lock either the whole table (if there's no other way) or only the rows returned by the sub-SELECT, but as far as I can see, such row-level locking can't be achieved in PostgreSQL but through the use of serialization. Allowing FOR UPDATE in sub-queries has been rejected in this same thread, you've also mentioned it your previous reply. http://archives.postgresql.org/pgsql-bugs/2004-10/msg00150.php > This is pretty weird behavior, and I am among those who think it > sucks. But it is documented. > > http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY This behavior may be consistent with the transaction isolation levels PostgreSQL provides (read committed/serializable, while one would need true repeatable read in this case), it is a huge drawback when working with the database, as one has to think about potential collateral damage with every single SELECT...FOR UPDATE / UPDATE query one writes, especially the complex ones. > An interesting fact is that if you stick another "for update" into the > subquery here, the command will fail utterly, with the following error > message: > > ERROR: SELECT FOR UPDATE/SHARE is not allowed with aggregate functions See my second link above, it works as designed. The document states that the FOR UPDATE is applied to the view or sub-query, which I assume means: "applied to a SELECT containing a sub-query or view". As such, I'm expecting a single statement to be an atomic operation, i.e. always works on the same data snapshot, independent of the transaction isolation level. This atomicity has to be enforced through a full table lock (which is often the easiest to implement, but also the most expensive efficiency-wise) or serialization. Side note: a cross-database test showed that Oracle, DB2 nor MySQL (with InnoDB storage engine) run the query as true repeatable read, whether one makes use of the "skip locked rows" (when available) or not. One always ends up with two different rows being updated. Best regards, Steve Caligo