Thread: Generating Lots of PKs with nextval(): A Feature Proposal

Generating Lots of PKs with nextval(): A Feature Proposal

From

Peter Crabtree

Date:

14 May 2010, 16:03:57

Recently, in preparation for migrating an application to postgres, I
got to this part of the manual (which is *excellent* so far, by the
way):

http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html

A quick check with the folks on #postgresql confirmed my
understanding, which was that the locking semantics of setval() and
nextval() make this unsafe:

SELECT setval('my_seq', nextval('my_seq') + 500);

Now, I was reminded that I could simply do this:

SELECT nextval('my_seq') FROM generate_series(1, 500);

But of course then I would have no guarantee that I would get a
contiguous block of ids, which means if I'm using this to do a mass
insert of records which refer to each others' ids (example: storing a
directed, linear graph), I either have to do a correlated update on
the client side, after transferring the keys (consider the cost of
doing this for a few million records - 4 MB in keys per million
records, for, in extreme cases, 12 MB of data to be inserted -- 33%
overhead in the worst case, presuming symmetric bandwidth), or I have
to insert into a temporary table, then have the db backend do the
update, then insert from there to the real table. Both are imperfect
options in terms of performance and complexity.

Thus, before I start work on it, I propose an extension to the current
nextval():

SELECT nextval('my_seq', 500);

This would increment the my_seq sequence by its interval * 500, and
return the first valid key. This both makes client code that needs a
bunch of PKs simpler to implement, and saves in performance, since the
client can just replace all its PKs (presuming they're currently a
contiguous block from 1 to n) with my_starting_pk + current_pk, so
this:
pk | next_node
----+----------- 0 |         1 1 |         2 2 |         0

can be easily updated like this:

SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
currval('my_seq') + next_node;

to something like this:
  pk   | next_node
--------+----------521650 |    521651521651 |    521652521652 |    521650

This is a net gain of performance and ease of implementation in many
cases where a large number of ids from a sequence are needed -- with a
small added benefit of the keys being guaranteed to be contiguous.

I don't see any technical problems with this; postgres already can
"pre-allocate" more than one key, but the number is semi-static (the
CACHE parameter to CREATE SEQUENCE). This might break existing user
code if they've defined a nextval(regclass, integer), but I don't see
any way to

Finally, I've checked sequence.c -- this looks pretty straightforward
to implement, but I figured checking with this list was wise before
starting work. Apologies if I've been overly wordy.

Peter

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Kenneth Marshall

Date:

14 May 2010, 16:07:38

Hi Peter,

All you need to do is define your own sequence with an
increment of 500. Look at:

http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

Regards,
Ken

On Fri, May 14, 2010 at 02:56:18PM -0400, Peter Crabtree wrote:
> Recently, in preparation for migrating an application to postgres, I
> got to this part of the manual (which is *excellent* so far, by the
> way):
> 
> http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html
> 
> A quick check with the folks on #postgresql confirmed my
> understanding, which was that the locking semantics of setval() and
> nextval() make this unsafe:
> 
> SELECT setval('my_seq', nextval('my_seq') + 500);
> 
> Now, I was reminded that I could simply do this:
> 
> SELECT nextval('my_seq') FROM generate_series(1, 500);
> 
> But of course then I would have no guarantee that I would get a
> contiguous block of ids, which means if I'm using this to do a mass
> insert of records which refer to each others' ids (example: storing a
> directed, linear graph), I either have to do a correlated update on
> the client side, after transferring the keys (consider the cost of
> doing this for a few million records - 4 MB in keys per million
> records, for, in extreme cases, 12 MB of data to be inserted -- 33%
> overhead in the worst case, presuming symmetric bandwidth), or I have
> to insert into a temporary table, then have the db backend do the
> update, then insert from there to the real table. Both are imperfect
> options in terms of performance and complexity.
> 
> Thus, before I start work on it, I propose an extension to the current
> nextval():
> 
> SELECT nextval('my_seq', 500);
> 
> This would increment the my_seq sequence by its interval * 500, and
> return the first valid key. This both makes client code that needs a
> bunch of PKs simpler to implement, and saves in performance, since the
> client can just replace all its PKs (presuming they're currently a
> contiguous block from 1 to n) with my_starting_pk + current_pk, so
> this:
> 
>  pk | next_node
> ----+-----------
>   0 |         1
>   1 |         2
>   2 |         0
> 
> can be easily updated like this:
> 
> SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
> UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
> currval('my_seq') + next_node;
> 
> to something like this:
> 
>    pk   | next_node
> --------+----------
>  521650 |    521651
>  521651 |    521652
>  521652 |    521650
> 
> This is a net gain of performance and ease of implementation in many
> cases where a large number of ids from a sequence are needed -- with a
> small added benefit of the keys being guaranteed to be contiguous.
> 
> I don't see any technical problems with this; postgres already can
> "pre-allocate" more than one key, but the number is semi-static (the
> CACHE parameter to CREATE SEQUENCE). This might break existing user
> code if they've defined a nextval(regclass, integer), but I don't see
> any way to
> 
> Finally, I've checked sequence.c -- this looks pretty straightforward
> to implement, but I figured checking with this list was wise before
> starting work. Apologies if I've been overly wordy.
> 
> Peter
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

hubert depesz lubaczewski

Date:

14 May 2010, 18:04:18

On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
> Hi Peter,
> 
> All you need to do is define your own sequence with an
> increment of 500. Look at:
> 
> http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

This is often not enough. For example - I want standard increment of 1,
but right now I'm importing 10000 objects, and it would be simpler for
me to get 10000 ids. Preferably in one block.

This is not achievable now. I know I can 'alter sequence set increment
by' - but this will also affect concurrent sessions. which might not be
a problem, but it's a side effect that I don't want.

+1 for original proposition, would love to get it.

depesz

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Tom Lane

Date:

14 May 2010, 18:27:10

Peter Crabtree <peter.crabtree@gmail.com> writes:
> Now, I was reminded that I could simply do this:

> SELECT nextval('my_seq') FROM generate_series(1, 500);

> But of course then I would have no guarantee that I would get a
> contiguous block of ids,

The existing "cache" behavior will already handle that for you,
I believe.  I don't really see a need for new features here.
        regards, tom lane

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Robert Haas

Date:

14 May 2010, 18:29:46

On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
>> Hi Peter,
>>
>> All you need to do is define your own sequence with an
>> increment of 500. Look at:
>>
>> http://www.postgresql.org/docs/8.4/static/sql-createsequence.html
>
> This is often not enough. For example - I want standard increment of 1,
> but right now I'm importing 10000 objects, and it would be simpler for
> me to get 10000 ids. Preferably in one block.
>
> This is not achievable now. I know I can 'alter sequence set increment
> by' - but this will also affect concurrent sessions. which might not be
> a problem, but it's a side effect that I don't want.
>
> +1 for original proposition, would love to get it.

If we do this, I'm inclined to think that the extra argument to
nextval() should be treated as overriding the base increment rather
than specifying a multiplier for it.  Other than that nitpick, it
sounds like a reasonable thing to allow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Peter Crabtree

Date:

14 May 2010, 18:52:30

On Fri, May 14, 2010 at 5:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Crabtree <peter.crabtree@gmail.com> writes:
>> Now, I was reminded that I could simply do this:
>
>> SELECT nextval('my_seq') FROM generate_series(1, 500);
>
>> But of course then I would have no guarantee that I would get a
>> contiguous block of ids,
>
> The existing "cache" behavior will already handle that for you,
> I believe.  I don't really see a need for new features here.

I don't see how that works for this case, because the "cache" setting
is "static", and also shared between sessions. So if I have 10 records
one time, and 100 records the next, and 587 the third time, what
should my CACHE be set to for that sequence?

And if I do ALTER SEQUENCE SET CACHE each time, I have either killed
concurrency (because I'm locking other sessions out of using that
sequence until I'm finished with it), or I have a race condition (if
someone else issues an ALTER SEQUENCE before I call nextval()). The
same problem exists with using ALTER SEQUENCE SET INCREMENT BY.

Peter

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Peter Crabtree

Date:

14 May 2010, 18:59:30

On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski
> <depesz@depesz.com> wrote:
>> On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
>>> Hi Peter,
>>>
>>> All you need to do is define your own sequence with an
>>> increment of 500. Look at:
>>>
>>> http://www.postgresql.org/docs/8.4/static/sql-createsequence.html
>>
>> This is often not enough. For example - I want standard increment of 1,
>> but right now I'm importing 10000 objects, and it would be simpler for
>> me to get 10000 ids. Preferably in one block.
>>
>> This is not achievable now. I know I can 'alter sequence set increment
>> by' - but this will also affect concurrent sessions. which might not be
>> a problem, but it's a side effect that I don't want.
>>
>> +1 for original proposition, would love to get it.
>
> If we do this, I'm inclined to think that the extra argument to
> nextval() should be treated as overriding the base increment rather
> than specifying a multiplier for it.  Other than that nitpick, it
> sounds like a reasonable thing to allow.
>

After giving it some thought, that sounds better. You gain some
functionality that way (temporarily overriding the interval) and lose
none.

Peter

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Tom Lane

Date:

14 May 2010, 19:11:50

Peter Crabtree <peter.crabtree@gmail.com> writes:
> On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If we do this, I'm inclined to think that the extra argument to
>> nextval() should be treated as overriding the base increment rather
>> than specifying a multiplier for it.  Other than that nitpick, it
>> sounds like a reasonable thing to allow.

> After giving it some thought, that sounds better. You gain some
> functionality that way (temporarily overriding the interval) and lose
> none.

Well, what you lose is the previous assurance that values of nextval()
were always multiples of the increment.  I could see that breaking
applications that are using non-unity increments.
        regards, tom lane

Re: Generating Lots of PKs with nextval(): A Feature Proposal

From

Robert Haas

Date:

14 May 2010, 19:22:28

On Fri, May 14, 2010 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Crabtree <peter.crabtree@gmail.com> writes:
>> On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> If we do this, I'm inclined to think that the extra argument to
>>> nextval() should be treated as overriding the base increment rather
>>> than specifying a multiplier for it.  Other than that nitpick, it
>>> sounds like a reasonable thing to allow.
>
>> After giving it some thought, that sounds better. You gain some
>> functionality that way (temporarily overriding the interval) and lose
>> none.
>
> Well, what you lose is the previous assurance that values of nextval()
> were always multiples of the increment.  I could see that breaking
> applications that are using non-unity increments.

Err, right.  But those applications presumably will also not be using
this new behavior.  There are no versions of PG that have an extra
argument to nextval but still guarantee that the values of nextval()
are multiples of the increment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company