Thread: sequences and pg_upgrade
I was toying with a couple of ideas that would involve changing the storage of sequences. (Say, for the sake of discussion, removing the problematic/useless sequence_name field.) This would cause problems for pg_upgrade, because pg_upgrade copies the "heap" storage of sequences like it does for normal tables, and we have no facilities for effecting any changes during that. There was a previous discussion in the early days of pg_migrator, which resulted in the current behavior: https://www.postgresql.org/message-id/flat/20090713220112.GF7933%40klana.box This also alluded to what I think was the last change in the sequence storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between versions 8.3 and 8.4. How did pg_upgrade handle that? I think the other solution mentioned in that thread would also work: Have pg_upgrade treat sequences more like system catalogs, whose format changes between major releases, and transferred them via the dump/restore route. So instead of copying the disk files, issue a setval call, and the sequence should be all set up. Am I missing anything? Attached is a rough patch set that would implement that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I was toying with a couple of ideas that would involve changing the > storage of sequences. (Say, for the sake of discussion, removing the > problematic/useless sequence_name field.) This would cause problems for > pg_upgrade, because pg_upgrade copies the "heap" storage of sequences > like it does for normal tables, and we have no facilities for effecting > any changes during that. > There was a previous discussion in the early days of pg_migrator, which > resulted in the current behavior: > https://www.postgresql.org/message-id/flat/20090713220112.GF7933%40klana.box > This also alluded to what I think was the last change in the sequence > storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between > versions 8.3 and 8.4. How did pg_upgrade handle that? I think it probably never did handle that. pg_upgrade doesn't currently claim to support migrating from 8.3, and the thread you mention shows that the original attempt at 8.3->8.4 migration crashed-and-burned for numerous unrelated reasons. We may not have ever got to the point of noticing that 10a3471be also created a problem. > I think the other solution mentioned in that thread would also work: > Have pg_upgrade treat sequences more like system catalogs, whose format > changes between major releases, and transferred them via the > dump/restore route. So instead of copying the disk files, issue a > setval call, and the sequence should be all set up. Seems reasonable. If you're proposing to expose --sequence-data as a user-visible option, the patch set lacks documentation. But I wonder whether it shouldn't simply be a side-effect of --binary-upgrade. It seems a tad non-orthogonal for a user switch. regards, tom lane
On Tue, Aug 30, 2016 at 08:46:48AM -0400, Peter Eisentraut wrote: > I think the other solution mentioned in that thread would also work: > Have pg_upgrade treat sequences more like system catalogs, whose format > changes between major releases, and transferred them via the > dump/restore route. So instead of copying the disk files, issue a > setval call, and the sequence should be all set up. > > Am I missing anything? Looks straight-forward to me. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 2016-08-30 08:46:48 -0400, Peter Eisentraut wrote: > I was toying with a couple of ideas that would involve changing the > storage of sequences. (Say, for the sake of discussion, removing the > problematic/useless sequence_name field.) I'd be quite interested to know what changes that are... > I think the other solution mentioned in that thread would also work: > Have pg_upgrade treat sequences more like system catalogs, whose format > changes between major releases, and transferred them via the > dump/restore route. So instead of copying the disk files, issue a > setval call, and the sequence should be all set up. +1.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, failed Thank you for the patch. As I see there are no objections in the discussion, all the patches look clear. Could you clarify, please, why do we dump sequence in schemaOnly mode? + if (dopt.schemaOnly && dopt.sequence_data) + getSequenceData(&dopt, tblinfo, numTables, dopt.oids); Example: postgres=# create table t(i serial, data text); postgres=# insert into t(data) values ('aaa'); pg_dump -d postgres --sequence-data --schema-only > ../reviews/dump_pg Then restore it into newdb and add new value. newdb=# insert into t(data) values ('aaa'); INSERT 0 1 newdb=# select * from t;i | data ---+------2 | aaa I'm not an experienced user, but I thought that while doing dump/restore of schema of database we reset all the data. Why should the table in newly created (via pg_restore) database have non-default sequence value? I also did some other tests and all of them were passed. One more thing to do is a documentation for the new option. You should update help() function in pg_dump.c and also add some notes to pg_dump.sgml and probably to pgupgrade.sgml. The new status of this patch is: Waiting on Author
On 9/14/16 8:52 AM, Anastasia Lubennikova wrote: > Could you clarify, please, why do we dump sequence in schemaOnly mode? > + if (dopt.schemaOnly && dopt.sequence_data) > + getSequenceData(&dopt, tblinfo, numTables, dopt.oids); The point of this patch is that with the new option, you can dump sequence data (but not table data) alongside with the schema. This would be used by pg_upgrade for the reasons described at the beginning of the thread. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
15.09.2016 15:29, Peter Eisentraut: > On 9/14/16 8:52 AM, Anastasia Lubennikova wrote: >> Could you clarify, please, why do we dump sequence in schemaOnly mode? >> + if (dopt.schemaOnly && dopt.sequence_data) >> + getSequenceData(&dopt, tblinfo, numTables, dopt.oids); > The point of this patch is that with the new option, you can dump > sequence data (but not table data) alongside with the schema. This > would be used by pg_upgrade for the reasons described at the beginning > of the thread. > Oh, thank you. Now I see. Somewhy I thought that it *always* dumps sequence data in schemaOnly mode. Which is definitely not true. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Here is an updated patch set. Compared to the initial set, I have changed pg_dump's sorting priorities so that sequence data is always after table data. This would otherwise have introduced a problem because sortDataAndIndexObjectsBySize() only considers consecutive DO_TABLE_DATA entries. Also, I have removed the separate --sequence-data switch from pg_dump and made it implicit in --binary-upgrade. (So the previous patches 0002 and 0003 have been combined, because it's no longer a separate feature.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
23.09.2016 21:06, Peter Eisentraut: > Here is an updated patch set. Compared to the initial set, I have > changed pg_dump's sorting priorities so that sequence data is always > after table data. This would otherwise have introduced a problem > because sortDataAndIndexObjectsBySize() only considers consecutive > DO_TABLE_DATA entries. Also, I have removed the separate > --sequence-data switch from pg_dump and made it implicit in > --binary-upgrade. (So the previous patches 0002 and 0003 have been > combined, because it's no longer a separate feature.) > The patches are good, no complaints. But again, I have the same question. I was confused, why do we always dump sequence data, because I'd overlooked the --sequence-data key. I'd rather leave this option, because it's quite non intuitive behaviour... /* dump sequence data even in schema-only mode */ -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sat, Oct 1, 2016 at 1:50 AM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > 23.09.2016 21:06, Peter Eisentraut: >> >> Here is an updated patch set. Compared to the initial set, I have >> changed pg_dump's sorting priorities so that sequence data is always >> after table data. This would otherwise have introduced a problem >> because sortDataAndIndexObjectsBySize() only considers consecutive >> DO_TABLE_DATA entries. Also, I have removed the separate >> --sequence-data switch from pg_dump and made it implicit in >> --binary-upgrade. (So the previous patches 0002 and 0003 have been >> combined, because it's no longer a separate feature.) >> > > The patches are good, no complaints. > But again, I have the same question. > I was confused, why do we always dump sequence data, > because I'd overlooked the --sequence-data key. I'd rather leave this > option, > because it's quite non intuitive behaviour... > /* dump sequence data even in schema-only mode */ Moved to next CF. This is fresh. -- Michael
On 9/30/16 12:50 PM, Anastasia Lubennikova wrote: > The patches are good, no complaints. > But again, I have the same question. > I was confused, why do we always dump sequence data, > because I'd overlooked the --sequence-data key. I'd rather leave this > option, > because it's quite non intuitive behaviour... > /* dump sequence data even in schema-only mode */ Here are rebased patches. Regarding your question: The initial patch had a separate option for this behavior, which was then used by pg_upgrade. It was commented that this option is not useful outside of pg_upgrade, so it doesn't need to be exposed as a user-facing option. I agreed with that and removed the option. We can always add the option back easily if someone really wants it, but so far no use case has been presented. So I suggest we proceed with this proposal ignoring whether this option is exposed or not. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Oct 31, 2016 at 9:53 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/30/16 12:50 PM, Anastasia Lubennikova wrote: >> The patches are good, no complaints. >> But again, I have the same question. >> I was confused, why do we always dump sequence data, >> because I'd overlooked the --sequence-data key. I'd rather leave this >> option, >> because it's quite non intuitive behaviour... >> /* dump sequence data even in schema-only mode */ > > Here are rebased patches. > > Regarding your question: The initial patch had a separate option for > this behavior, which was then used by pg_upgrade. It was commented that > this option is not useful outside of pg_upgrade, so it doesn't need to > be exposed as a user-facing option. I agreed with that and removed the > option. We can always add the option back easily if someone really > wants it, but so far no use case has been presented. So I suggest we > proceed with this proposal ignoring whether this option is exposed or not. I had a look at those fresh patches, and 0001 looks like a good thing. This makes the separation between sequences and table data dump cleaner. I ran some tests with pg_upgrade and 0002, and things are clear. And +1 for the way done in the patch, aka no options of pg_dump exposed to user, still keep the option tracking as a separate value. One small thing here:static void -getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids) +getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids, char relkind){ int i; for (i = 0; i < numTables; i++) { - if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA) + if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA && + (!relkind || tblinfo[i].relkind == relkind)) makeTableDataInfo(dopt, &(tblinfo[i]), oids) One idea here would be to have an extra routine, getSequenceData and not extend getTableData() with relkind as extra argument. I am fine with the way patch does things, so I just switched the patch as ready for committer. -- Michael
On 11/2/16 2:34 AM, Michael Paquier wrote: > I had a look at those fresh patches, and 0001 looks like a good thing. > This makes the separation between sequences and table data dump > cleaner. I ran some tests with pg_upgrade and 0002, and things are > clear. And +1 for the way done in the patch, aka no options of pg_dump > exposed to user, still keep the option tracking as a separate value. Committed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services