Thread: Order of update
I've just read Laurenz' blog post about the differences between Oracle and PostgreSQL[1]. One of the differences is that something like UPDATE tab SET id = id + 1; tends to fail on PostgreSQL because the the primary key constraint is checked for every row, so it will stumble over the temporary conflicts. The solution is to define the constraint as deferrable. But that got me to thinking about different ways ... There won't be a conflict if the ids are updated in descending order. Is there a way to force PostgreSQL to update the rows in a specific order? I came up with with a as (select id from t where id > 50 order by id desc) update t set id = a.id+1 from a where t.id = a.id; which works in my simple test case, but it doesn't look like it's guaranteed to work. The implicit join in «update t ... from a» could produce rows in any order, especially for large tables. So, is there a better way? hjp [1] https://www.cybertec-postgresql.com/en/comparison-of-the-transaction-systems-of-oracle-and-postgresql/ -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Very interesting. But is the sort overhead worth it? Why not make the constraint deferrable before the update and switchback afterwards?
On 2025-04-20 11:34:56 +0200, Thiemo Kellner wrote: > Very interesting. But is the sort overhead worth it? Why not make the > constraint deferrable before the update and switch back afterwards? Mostly idle curiosity whether that's possible at all. But there might be other reasons why you want to do updates in a predictable order. For example to prevent deadlocks. hjp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hm, deadlocks preventing order by. Never had that problem. Then again, I mostly have Oracle experience and no need for complicatedupdates. If I had, I'd rather think of chunking updates and orchestrate those before ordering within updates.
On Sun, Apr 20, 2025 at 5:35 AM Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:
Very interesting. But is the sort overhead worth it? Why not make the constraint deferrable before the update and switch back afterwards?
The role which runs the UPDATE might not have the priv to ALTER TABLE ... ALTER CONSTRAINT.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Might that be a feature of or a flaw in the application design? I opt for the latter. Any application that needs updates, be it only in emergency cases, should take that into account.
On 4/20/25 02:10, Peter J. Holzer wrote: > I've just read Laurenz' blog post about the differences between Oracle > and PostgreSQL[1]. > > One of the differences is that something like > > UPDATE tab SET id = id + 1; > > tends to fail on PostgreSQL because the the primary key constraint is > checked for every row, so it will stumble over the temporary conflicts. > > The solution is to define the constraint as deferrable. > > But that got me to thinking about different ways ... > > There won't be a conflict if the ids are updated in descending order. > Is there a way to force PostgreSQL to update the rows in a specific > order? > > I came up with > > with a as (select id from t where id > 50 order by id desc) > update t set id = a.id+1 from a where t.id = a.id; > > which works in my simple test case, but it doesn't look like it's > guaranteed to work. The implicit join in «update t ... from a» could > produce rows in any order, especially for large tables. My read of this is that for the duration of the query a temporary table a is create that is ordered on `id desc` and that '... from a where t.id = a.id' will apply that order to the selection of t.id. As example: create table id_update(id integer primary key); insert into id_update select a from generate_series(1, 100000) as t(a); INSERT 0 100000 -- id(s) are temporarily in order. update id_update set id = id where id between 50000 and 60000; UPDATE 10001 -- The above move the 10001 values to 'end' of id_update with a as (select id from id_update where id > 100 order by id desc) update id_update as t set id = a.id + 1 from a where t.id = a.id; UPDATE 99900 -- The UPDATE works even though the t.id(s) in id_update are not ordered -- by id > > So, is there a better way? > > hjp > > > [1] https://www.cybertec-postgresql.com/en/comparison-of-the-transaction-systems-of-oracle-and-postgresql/ > -- Adrian Klaver adrian.klaver@aklaver.com
On 2025-04-20 08:28:22 -0700, Adrian Klaver wrote: > On 4/20/25 02:10, Peter J. Holzer wrote: > > I've just read Laurenz' blog post about the differences between Oracle > > and PostgreSQL[1]. > > > > One of the differences is that something like > > > > UPDATE tab SET id = id + 1; > > > > tends to fail on PostgreSQL because the the primary key constraint is > > checked for every row, so it will stumble over the temporary conflicts. > > > > The solution is to define the constraint as deferrable. > > > > But that got me to thinking about different ways ... > > > > There won't be a conflict if the ids are updated in descending order. > > Is there a way to force PostgreSQL to update the rows in a specific > > order? > > > > I came up with > > > > with a as (select id from t where id > 50 order by id desc) > > update t set id = a.id+1 from a where t.id = a.id; > > > > which works in my simple test case, but it doesn't look like it's > > guaranteed to work. The implicit join in «update t ... from a» could > > produce rows in any order, especially for large tables. > > My read of this Your read of the query, the PostgreSQL source or the SQL standard? > is that for the duration of the query a temporary table a is create > that is ordered on `id desc` and that '... from a where t.id = a.id' > will apply that order to the selection of t.id. Yes, that's the intention. In as I wrote it did work in my simple tests. But is it guaranteed to work? Is there anything in the standard that says that the order has to be preserved? Or failing that, is that the way it's currently implemented and there are reasons to assume that it will never be changed? > As example: > > create table id_update(id integer primary key); > > insert into id_update select a from generate_series(1, 100000) as t(a); > INSERT 0 100000 > > -- id(s) are temporarily in order. > > update id_update set id = id where id between 50000 and 60000; > UPDATE 10001 > > -- The above move the 10001 values to 'end' of id_update > > with a as (select id from id_update where id > 100 order by id desc) update > id_update as t set id = a.id + 1 from a where t.id = a.id; > UPDATE 99900 I note that this produces a hash join: #v+ ╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ QUERY PLAN ║ ╟───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ Update on id_update t (cost=3179.42..8662.63 rows=0 width=0) ║ ║ -> Hash Join (cost=3179.42..8662.63 rows=99899 width=38) ║ ║ Hash Cond: (a.id = t.id) ║ ║ -> Subquery Scan on a (cost=0.42..4971.64 rows=99899 width=32) ║ ║ -> Index Only Scan Backward using id_update_pkey on id_update (cost=0.42..3972.65 rows=99899 width=4) ║ ║ Index Cond: (id > 100) ║ ║ -> Hash (cost=1929.00..1929.00 rows=100000 width=10) ║ ║ -> Seq Scan on id_update t (cost=0.00..1929.00 rows=100000 width=10) ║ ╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ #v- If the hash was the other way around it wouldn't work. So let's try if we can get the optimizer to flip the plan by changing the number of updated rows. [a few minutes later] #v+ hjp=> explain with a as (select id from id_update where id > 90000 order by id desc) update id_update as t set id = a.id + 1 from a where a.id = t.id; ╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ QUERY PLAN ║ ╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ Update on id_update t (cost=732.53..2675.61 rows=0 width=0) ║ ║ -> Hash Join (cost=732.53..2675.61 rows=10006 width=38) ║ ║ Hash Cond: (t.id = a.id) ║ ║ -> Seq Scan on id_update t (cost=0.00..1443.00 rows=100000 width=10) ║ ║ -> Hash (cost=607.46..607.46 rows=10006 width=32) ║ ║ -> Subquery Scan on a (cost=0.29..607.46 rows=10006 width=32) ║ ║ -> Index Only Scan Backward using id_update_pkey on id_update (cost=0.29..507.40 rows=10006 width=4)║ ║ Index Cond: (id > 90000) ║ ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ #v- Looks like we got it. And indeed: #v+ hjp=> with a as (select id from id_update where id > 90000 order by id desc) update id_update as t set id = a.id + 1 from a where a.id = t.id; ERROR: duplicate key value violates unique constraint "id_update_pkey" DETAIL: Key (id)=(90002) already exists. #v- So, obviously that isn't guaranteed to work. hjp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 4/21/25 01:47, Peter J. Holzer wrote: > > If the hash was the other way around it wouldn't work. > > So let's try if we can get the optimizer to flip the plan by changing > the number of updated rows. > > [a few minutes later] > > #v+ > hjp=> explain > with a as (select id from id_update where id > 90000 order by id desc) > update id_update as t set id = a.id + 1 from a where a.id = t.id; > ╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ > ║ QUERY PLAN ║ > ╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ > ║ Update on id_update t (cost=732.53..2675.61 rows=0 width=0) ║ > ║ -> Hash Join (cost=732.53..2675.61 rows=10006 width=38) ║ > ║ Hash Cond: (t.id = a.id) ║ > ║ -> Seq Scan on id_update t (cost=0.00..1443.00 rows=100000 width=10) ║ > ║ -> Hash (cost=607.46..607.46 rows=10006 width=32) ║ > ║ -> Subquery Scan on a (cost=0.29..607.46 rows=10006 width=32) ║ > ║ -> Index Only Scan Backward using id_update_pkey on id_update (cost=0.29..507.40 rows=10006 width=4)║ > ║ Index Cond: (id > 90000) ║ > ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ > #v- > > Looks like we got it. > > And indeed: > > #v+ > hjp=> with a as (select id from id_update where id > 90000 order by id desc) > update id_update as t set id = a.id + 1 from a where a.id = t.id; > ERROR: duplicate key value violates unique constraint "id_update_pkey" > DETAIL: Key (id)=(90002) already exists. > #v- > > So, obviously that isn't guaranteed to work. I read from here: https://www.postgresql.org/docs/current/sql-update.html "Use of an ORDER BY clause allows the command to prioritize which rows will be updated; it can also prevent deadlock with other update operations if they use the same ordering." I went back to those docs and realized I had missed the FOR UPDATE in the example. explain with a as (select id from id_update where id > 90000 order by id desc for update) update id_update as t set id = a.id + 1 from a where a.id = t.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Update on id_update t (cost=3609.71..3856.94 rows=0 width=0) CTE a -> LockRows (cost=0.29..872.71 rows=9840 width=10) -> Index Scan Backward using id_update_pkey on id_update (cost=0.29..774.31 rows=9840 width=10) Index Cond: (id > 90000) -> Hash Join (cost=2737.00..2984.23 rows=9840 width=38) Hash Cond: (a.id = t.id) -> CTE Scan on a (cost=0.00..196.80 rows=9840 width=32) -> Hash (cost=1487.00..1487.00 rows=100000 width=10) -> Seq Scan on id_update t (cost=0.00..1487.00 rows=100000 width=10) (10 rows) and then: with a as (select id from id_update where id > 90000 order by id desc for update) update id_update as t set id = a.id + 1 from a where a.id = t.id; UPDATE 10000 Though at this point I would agree with you on the no guarantee point. > > hjp > -- Adrian Klaver adrian.klaver@aklaver.com
I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place.
On 4/21/25 09:12, Thiemo Kellner wrote: > I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place. > Check out this the thread below for discussion on that topic: https://www.postgresql.org/message-id/dkbnfi$7g5$1@sea.gmane.org -- Adrian Klaver adrian.klaver@aklaver.com
Thanks for the pointer. I feel my doubts reflected. For such reasons, I prefer the UUID as surrogate key. No point in tryingto establish an order or even id arithmetics. 21.04.2025 18:44:27 Adrian Klaver <adrian.klaver@aklaver.com>: > On 4/21/25 09:12, Thiemo Kellner wrote: >> I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place. >> > > Check out this the thread below for discussion on that topic: > > https://www.postgresql.org/message-id/dkbnfi$7g5$1@sea.gmane.org > > -- > Adrian Klaver > adrian.klaver@aklaver.com
On 2025-04-21 18:12:13 +0200, Thiemo Kellner wrote: > I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place. I agree that changing a surrogate key is almost always a mistake. But there might be situations where a column should be unique but isn't an id. For example, many years ago it was a popular[1] programming pattern to represent trees as nested ranges (i.e. if two children of a parent had the ranges (a, b) and (b+1, c) then the parent had (a-1, c+1). Insert-operations then need to update those columns. You want an index on those columns (since you search for them a lot), and you might want to make it a unique index, since that covers part of the invariant (although not the complete invariant). If you do that you run into the update problem. There are probably other use-cases. Anything where you need a unique order which can change, I guess? Anyway, I don't have a pressing need for this, as I said I was just curious. hjp [1] Mostly in MySQL I think, since it didn't have recursive queries of any kind. -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"