Thread: Generating Lots of PKs with nextval(): A Feature Proposal
Recently, in preparation for migrating an application to postgres, I got to this part of the manual (which is *excellent* so far, by the way): http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html A quick check with the folks on #postgresql confirmed my understanding, which was that the locking semantics of setval() and nextval() make this unsafe: SELECT setval('my_seq', nextval('my_seq') + 500); Now, I was reminded that I could simply do this: SELECT nextval('my_seq') FROM generate_series(1, 500); But of course then I would have no guarantee that I would get a contiguous block of ids, which means if I'm using this to do a mass insert of records which refer to each others' ids (example: storing a directed, linear graph), I either have to do a correlated update on the client side, after transferring the keys (consider the cost of doing this for a few million records - 4 MB in keys per million records, for, in extreme cases, 12 MB of data to be inserted -- 33% overhead in the worst case, presuming symmetric bandwidth), or I have to insert into a temporary table, then have the db backend do the update, then insert from there to the real table. Both are imperfect options in terms of performance and complexity. Thus, before I start work on it, I propose an extension to the current nextval(): SELECT nextval('my_seq', 500); This would increment the my_seq sequence by its interval * 500, and return the first valid key. This both makes client code that needs a bunch of PKs simpler to implement, and saves in performance, since the client can just replace all its PKs (presuming they're currently a contiguous block from 1 to n) with my_starting_pk + current_pk, so this: pk | next_node ----+----------- 0 | 1 1 | 2 2 | 0 can be easily updated like this: SELECT nextval('my_seq', (SELECT count(*) FROM my_table)); UPDATE my_table SET pk = currval('my_seq') + pk, next_node = currval('my_seq') + next_node; to something like this: pk | next_node --------+----------521650 | 521651521651 | 521652521652 | 521650 This is a net gain of performance and ease of implementation in many cases where a large number of ids from a sequence are needed -- with a small added benefit of the keys being guaranteed to be contiguous. I don't see any technical problems with this; postgres already can "pre-allocate" more than one key, but the number is semi-static (the CACHE parameter to CREATE SEQUENCE). This might break existing user code if they've defined a nextval(regclass, integer), but I don't see any way to Finally, I've checked sequence.c -- this looks pretty straightforward to implement, but I figured checking with this list was wise before starting work. Apologies if I've been overly wordy. Peter
Hi Peter, All you need to do is define your own sequence with an increment of 500. Look at: http://www.postgresql.org/docs/8.4/static/sql-createsequence.html Regards, Ken On Fri, May 14, 2010 at 02:56:18PM -0400, Peter Crabtree wrote: > Recently, in preparation for migrating an application to postgres, I > got to this part of the manual (which is *excellent* so far, by the > way): > > http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html > > A quick check with the folks on #postgresql confirmed my > understanding, which was that the locking semantics of setval() and > nextval() make this unsafe: > > SELECT setval('my_seq', nextval('my_seq') + 500); > > Now, I was reminded that I could simply do this: > > SELECT nextval('my_seq') FROM generate_series(1, 500); > > But of course then I would have no guarantee that I would get a > contiguous block of ids, which means if I'm using this to do a mass > insert of records which refer to each others' ids (example: storing a > directed, linear graph), I either have to do a correlated update on > the client side, after transferring the keys (consider the cost of > doing this for a few million records - 4 MB in keys per million > records, for, in extreme cases, 12 MB of data to be inserted -- 33% > overhead in the worst case, presuming symmetric bandwidth), or I have > to insert into a temporary table, then have the db backend do the > update, then insert from there to the real table. Both are imperfect > options in terms of performance and complexity. > > Thus, before I start work on it, I propose an extension to the current > nextval(): > > SELECT nextval('my_seq', 500); > > This would increment the my_seq sequence by its interval * 500, and > return the first valid key. This both makes client code that needs a > bunch of PKs simpler to implement, and saves in performance, since the > client can just replace all its PKs (presuming they're currently a > contiguous block from 1 to n) with my_starting_pk + current_pk, so > this: > > pk | next_node > ----+----------- > 0 | 1 > 1 | 2 > 2 | 0 > > can be easily updated like this: > > SELECT nextval('my_seq', (SELECT count(*) FROM my_table)); > UPDATE my_table SET pk = currval('my_seq') + pk, next_node = > currval('my_seq') + next_node; > > to something like this: > > pk | next_node > --------+---------- > 521650 | 521651 > 521651 | 521652 > 521652 | 521650 > > This is a net gain of performance and ease of implementation in many > cases where a large number of ids from a sequence are needed -- with a > small added benefit of the keys being guaranteed to be contiguous. > > I don't see any technical problems with this; postgres already can > "pre-allocate" more than one key, but the number is semi-static (the > CACHE parameter to CREATE SEQUENCE). This might break existing user > code if they've defined a nextval(regclass, integer), but I don't see > any way to > > Finally, I've checked sequence.c -- this looks pretty straightforward > to implement, but I figured checking with this list was wise before > starting work. Apologies if I've been overly wordy. > > Peter > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote: > Hi Peter, > > All you need to do is define your own sequence with an > increment of 500. Look at: > > http://www.postgresql.org/docs/8.4/static/sql-createsequence.html This is often not enough. For example - I want standard increment of 1, but right now I'm importing 10000 objects, and it would be simpler for me to get 10000 ids. Preferably in one block. This is not achievable now. I know I can 'alter sequence set increment by' - but this will also affect concurrent sessions. which might not be a problem, but it's a side effect that I don't want. +1 for original proposition, would love to get it. depesz
Peter Crabtree <peter.crabtree@gmail.com> writes: > Now, I was reminded that I could simply do this: > SELECT nextval('my_seq') FROM generate_series(1, 500); > But of course then I would have no guarantee that I would get a > contiguous block of ids, The existing "cache" behavior will already handle that for you, I believe. I don't really see a need for new features here. regards, tom lane
On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote: >> Hi Peter, >> >> All you need to do is define your own sequence with an >> increment of 500. Look at: >> >> http://www.postgresql.org/docs/8.4/static/sql-createsequence.html > > This is often not enough. For example - I want standard increment of 1, > but right now I'm importing 10000 objects, and it would be simpler for > me to get 10000 ids. Preferably in one block. > > This is not achievable now. I know I can 'alter sequence set increment > by' - but this will also affect concurrent sessions. which might not be > a problem, but it's a side effect that I don't want. > > +1 for original proposition, would love to get it. If we do this, I'm inclined to think that the extra argument to nextval() should be treated as overriding the base increment rather than specifying a multiplier for it. Other than that nitpick, it sounds like a reasonable thing to allow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, May 14, 2010 at 5:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Crabtree <peter.crabtree@gmail.com> writes: >> Now, I was reminded that I could simply do this: > >> SELECT nextval('my_seq') FROM generate_series(1, 500); > >> But of course then I would have no guarantee that I would get a >> contiguous block of ids, > > The existing "cache" behavior will already handle that for you, > I believe. I don't really see a need for new features here. I don't see how that works for this case, because the "cache" setting is "static", and also shared between sessions. So if I have 10 records one time, and 100 records the next, and 587 the third time, what should my CACHE be set to for that sequence? And if I do ALTER SEQUENCE SET CACHE each time, I have either killed concurrency (because I'm locking other sessions out of using that sequence until I'm finished with it), or I have a race condition (if someone else issues an ALTER SEQUENCE before I call nextval()). The same problem exists with using ALTER SEQUENCE SET INCREMENT BY. Peter
On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski > <depesz@depesz.com> wrote: >> On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote: >>> Hi Peter, >>> >>> All you need to do is define your own sequence with an >>> increment of 500. Look at: >>> >>> http://www.postgresql.org/docs/8.4/static/sql-createsequence.html >> >> This is often not enough. For example - I want standard increment of 1, >> but right now I'm importing 10000 objects, and it would be simpler for >> me to get 10000 ids. Preferably in one block. >> >> This is not achievable now. I know I can 'alter sequence set increment >> by' - but this will also affect concurrent sessions. which might not be >> a problem, but it's a side effect that I don't want. >> >> +1 for original proposition, would love to get it. > > If we do this, I'm inclined to think that the extra argument to > nextval() should be treated as overriding the base increment rather > than specifying a multiplier for it. Other than that nitpick, it > sounds like a reasonable thing to allow. > After giving it some thought, that sounds better. You gain some functionality that way (temporarily overriding the interval) and lose none. Peter
Peter Crabtree <peter.crabtree@gmail.com> writes: > On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> If we do this, I'm inclined to think that the extra argument to >> nextval() should be treated as overriding the base increment rather >> than specifying a multiplier for it. Other than that nitpick, it >> sounds like a reasonable thing to allow. > After giving it some thought, that sounds better. You gain some > functionality that way (temporarily overriding the interval) and lose > none. Well, what you lose is the previous assurance that values of nextval() were always multiples of the increment. I could see that breaking applications that are using non-unity increments. regards, tom lane
On Fri, May 14, 2010 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Crabtree <peter.crabtree@gmail.com> writes: >> On Fri, May 14, 2010 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> If we do this, I'm inclined to think that the extra argument to >>> nextval() should be treated as overriding the base increment rather >>> than specifying a multiplier for it. Other than that nitpick, it >>> sounds like a reasonable thing to allow. > >> After giving it some thought, that sounds better. You gain some >> functionality that way (temporarily overriding the interval) and lose >> none. > > Well, what you lose is the previous assurance that values of nextval() > were always multiples of the increment. I could see that breaking > applications that are using non-unity increments. Err, right. But those applications presumably will also not be using this new behavior. There are no versions of PG that have an extra argument to nextval but still guarantee that the values of nextval() are multiples of the increment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company