Thread: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
- Create tab1 table and stored procedure for testing with insert-tab1.sql
- Run several psql processes in parallel, calling the stored procedure: sh ./start2.sh (Linux/bash)
Attachment
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Sebastien Flaesch <sebastien.flaesch@4js.com> writes: > I try to update the underlying sequence of a SERIAL column, by using a RETURNING clause in my INSERT statement, which ischecking that the column value is greater than the last_value of my sequence, and reset the sequence with setval() if needed. It's not too surprising that that doesn't work, if you're coding it based on this assumption: > The whole INSERT statement (including the code in the RETURNING clause), should execute in a ATOMIC manner. Sequence-related actions are always carried out immediately, they do not participate in any atomicity guarantees about the calling transaction. Without this, any sequence update would have to block all concurrent uses of that sequence until they see whether the first update commits. If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table). The point of sequence objects is exactly to provide a feature with better concurrent performance, at the cost of no rollback guarantees. So, there's no bug here, and calling it one isn't going to change anybody's mind about that. regards, tom lane
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
On Tue, 19 Jul 2022 at 18:50, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote: >> If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table). > Can you elaborate please? > Do you mean the code should use an UPDATE on a one-row table to acquire a lock? That's the usual SQL way. You can, using the appropiate isolation level, select the max() from the table using the IDs ( but this will probably lock it whole ), you can use a one-row table with a current int column as a sequence, and set current=current+1, but this will lock everyone inserting ( and need an extra table ). You can use a name-current unique sequences table and update that ( it saves tables but it can worsen locking, as many engines locks whole pages ). The consecutive gapless sequence problem surfaces a lot, but there is no good way to solve it without locking, as once you get the value everybody else needing it has to wait until you decide whether to commit on rollback. Sequences are for when you need autogenerated keys. They use consecutive integer blocks nearly everywhere because they are easy to code, play well with btree indexes, correlate well with insertion times, but they give up the "no hole" approach to improve performance. They even can return values unordered in different sessions ( sequences acquire, and burn, blocks in a locking way, when you need high concurrency you make them return big blocks, but them values from different sessions can be delivered out of order ). If you need correlative numbers, byte the bullet and code it, trying to do it with sequences will probably end with an approach which will fail in some corner cases and be slower than just coding it with a single row table ( just remember to check it, so you do not have the fun which ensues in oracle when someone puts two rows in dual in the old days ). All the queries you have posted are probably more complex than a couple procedures to update and read a table, even using eval to simulate named sequences if you want ( just remember to always pre-increment before reading, there is a reason everybody does it that way ). As a remark, in Spain bill numbers need to be gapless increasing. I have done it with a sequence ( with cache 1, so I doubt it is much more performant than a single row table, as it has a backing element which needs to be saved ), and just used another process which swept the values and filled the gap a posteriori ( there is a way to fill a gap, has it legal uses, you can use it for that as long as you do not have too many, basically insert cancelled bills in the gaps ). I probably would do it with the one-row table now for simplicity, I doubt I gained enough performance to justify the 20-something perl lines plus the crontab entry to use sequences. As beard turns grey I apreciate more the simpler ways to do things. Francisco Olarte.
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
On 20 Jul 2022, at 4:08, Francisco Olarte wrote:
As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.
One answer to this problem has been around for a while, and my version is shown below. No extra table is needed, just a windowed look at the table where the reference is used. My usage is at human speed so performance is not an issue but it should be pretty good if there are minimal holes. What is not addressed is that a reference can be reissued upto the time the calling process commits an entry in the table and takes that reference out of circulation. There are different approaches to handling/preventing such collisions.
CREATE OR REPLACE FUNCTION accounts.next_active_reference()
RETURNS integer
LANGUAGE 'sql'
VOLATILE LEAKPROOF STRICT
PARALLEL UNSAFE
COST 3000 -- pure guesstimate
AS $BODY$ SELECT L.transaction_ref + 1 AS start FROM accounts.transaction_refs AS L LEFT OUTER JOIN accounts.transaction_refs AS R ON L.transaction_ref + 1 = R.transaction_ref WHERE R.transaction_ref IS NULL AND L.transaction_ref >700 -- historic reasons only, added to existing system ORDER BY L.transaction_ref LIMIT 1;
$BODY$;
COMMENT ON FUNCTION accounts.next_active_reference() IS
$COMMENT$
Determines the next available reference number, making sure to fill any holes.
The holes in the sequence prior to 700 are ignored (pure history), and this code
will make sure any out of sequence data blocks will not cause conflicts.
Credits:
Ref: <http://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/>
2022 update: this link is now dead, only reporting "There is nothing here".
$COMMENT$;
Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Example (assuming there is no unique constraint!):
- INSERT without value for serial column => sequence=0+1 => last_value = 1
- INSERT without value for serial column => sequence=1+1 => last_value = 2
- INSERT with value 2 for serial column => tab1.pkey(2) > last_value(2) ? false => no sequence reset
- INSERT without value for serial column => sequence=2+1 => last_value = 3
- INSERT with value 4 for serial column => tab1.pkey(4) > last_value ? true => setval(seqname,4,true)
- INSERT without value for serial column => sequence=4+1 => last_value = 5
Sent: Tuesday, July 19, 2022 5:41 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Sebastien Flaesch <sebastien.flaesch@4js.com> writes:
> I try to update the underlying sequence of a SERIAL column, by using a RETURNING clause in my INSERT statement, which is checking that the column value is greater than the last_value of my sequence, and reset the sequence with setval() if needed.
It's not too surprising that that doesn't work, if you're coding it
based on this assumption:
> The whole INSERT statement (including the code in the RETURNING clause), should execute in a ATOMIC manner.
Sequence-related actions are always carried out immediately, they do
not participate in any atomicity guarantees about the calling transaction.
Without this, any sequence update would have to block all concurrent
uses of that sequence until they see whether the first update commits.
If that's the behavior you want, you can build it out of standard SQL
facilities (e.g. update a one-row table). The point of sequence objects
is exactly to provide a feature with better concurrent performance,
at the cost of no rollback guarantees.
So, there's no bug here, and calling it one isn't going to change
anybody's mind about that.
regards, tom lane
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Sebastien Flaesch schrieb am 19.07.2022 um 18:50: > Tom, > > /If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table). > / > > Can you elaborate please? > > Do you mean the code should use an UPDATE on a one-row table to acquire a lock? I assume something like this: https://blog.sql-workbench.eu/post/gapless-sequence/
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Hi Gavan. On Wed, 20 Jul 2022 at 00:10, Gavan Schneider <list.pg.gavan@pendari.org> wrote: > On 20 Jul 2022, at 4:08, Francisco Olarte wrote: > As a remark, in Spain bill numbers need to be gapless increasing. I > have done it with .... > > One answer to this problem has been around for a while, and my version is shown below. If I read it correctly, leaving the 700 stuff aside, this function gives you the first free transaction_ref. This is not valid for my problem, I need them increasing, 1-3-2 is not a valid sequence. The trick I use is that I can have "dummy" records, which do not have real data ( or dates, which is what, among other things, makes me need them increasing ), so I generate 1-3-4 and then insert 2 in batch which values adequate for legal (i.e., I can just use 1.date for 2.date ). I think what you do is generate account numbers which should be gapless in the medium term and should be increasing as needed, wherte the gapless is more important to you than the increasing. > No extra table is needed, just a windowed look at the table where the reference is used. It is normally never needed, thay are used solely for performance and to avoid locking. A current_transaction_seq > My usage is at human speed so performance is not an issue but it should be pretty good if there are minimal holes. Whatis not addressed is that a reference can be reissued upto the time the calling process commits an entry in the tableand takes that reference out of circulation. There are different approaches to handling/preventing such collisions. Your construct seems pretty expensive, and I'm not sure how much locking it does at serializable. Also, given it needs recovery ( the single row table plus back-fill does not, if you consider the back-filling "bussines as usual" ). Also the reissuing of a number is a no-go in my automated systems, it would need extensive recovery, in that case I can use a gap-less approach by simpy selecting max(id) ( and given it is indexed, using a select for update on the max record by ordering desc and limit 1, if I'm not too confused this would give no gaps, increasing at the expense of reduced concurrency ). ... > $COMMENT$ > Determines the next available reference number, making sure to fill any holes. No going back allowed in my case. Simplifying it, a bill may reference an older one in its data, and its number MUST be less. The gap problem is there MUST be NO gaps WHEN I "print" them. And I can use dummies, but I SHOULD not use them. In practice, you rarely use more than a couple of dummies a year. I could strictly print dummies when I need them, but they MUST be always the same, and the easier way is to insert them. Francisco Olarte.
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Sent: Wednesday, July 20, 2022 8:56 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Sebastien Flaesch schrieb am 19.07.2022 um 18:50:
> Tom,
>
> /If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
> /
>
> Can you elaborate please?
>
> Do you mean the code should use an UPDATE on a one-row table to acquire a lock?
I assume something like this:
https://urldefense.com/v3/__https://blog.sql-workbench.eu/post/gapless-sequence/__;!!I_DbfM1H!F7_2cNahve0cmwPMP6QBBwwpyP6UAum4ukFj71_21ebcxTKXZFtU0_3O6l1lfG5jYiKjO7wEzRt_E1GbJ9Q$
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Am Wed, Jul 20, 2022 at 09:15:29AM +0000 schrieb Sebastien Flaesch: > Thomas, we already have a similar solution. > The idea is to use the native PostgreSQL SERIAL type. Which does not guarantuee gaplessness. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B