Thread: sequences and pg_upgrade

sequences and pg_upgrade

From

Peter Eisentraut

Date:

30 August 2016, 12:47:01

I was toying with a couple of ideas that would involve changing the
storage of sequences.  (Say, for the sake of discussion, removing the
problematic/useless sequence_name field.)  This would cause problems for
pg_upgrade, because pg_upgrade copies the "heap" storage of sequences
like it does for normal tables, and we have no facilities for effecting
any changes during that.

There was a previous discussion in the early days of pg_migrator, which
resulted in the current behavior:
https://www.postgresql.org/message-id/flat/20090713220112.GF7933%40klana.box

This also alluded to what I think was the last change in the sequence
storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between
versions 8.3 and 8.4.  How did pg_upgrade handle that?

I think the other solution mentioned in that thread would also work:
Have pg_upgrade treat sequences more like system catalogs, whose format
changes between major releases, and transferred them via the
dump/restore route.  So instead of copying the disk files, issue a
setval call, and the sequence should be all set up.

Am I missing anything?

Attached is a rough patch set that would implement that.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: sequences and pg_upgrade

From

Tom Lane

Date:

30 August 2016, 13:03:16

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I was toying with a couple of ideas that would involve changing the
> storage of sequences.  (Say, for the sake of discussion, removing the
> problematic/useless sequence_name field.)  This would cause problems for
> pg_upgrade, because pg_upgrade copies the "heap" storage of sequences
> like it does for normal tables, and we have no facilities for effecting
> any changes during that.

> There was a previous discussion in the early days of pg_migrator, which
> resulted in the current behavior:
> https://www.postgresql.org/message-id/flat/20090713220112.GF7933%40klana.box

> This also alluded to what I think was the last change in the sequence
> storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between
> versions 8.3 and 8.4.  How did pg_upgrade handle that?

I think it probably never did handle that.  pg_upgrade doesn't currently
claim to support migrating from 8.3, and the thread you mention shows that
the original attempt at 8.3->8.4 migration crashed-and-burned for numerous
unrelated reasons.  We may not have ever got to the point of noticing that
10a3471be also created a problem.

> I think the other solution mentioned in that thread would also work:
> Have pg_upgrade treat sequences more like system catalogs, whose format
> changes between major releases, and transferred them via the
> dump/restore route.  So instead of copying the disk files, issue a
> setval call, and the sequence should be all set up.

Seems reasonable.

If you're proposing to expose --sequence-data as a user-visible option,
the patch set lacks documentation.  But I wonder whether it shouldn't
simply be a side-effect of --binary-upgrade.  It seems a tad
non-orthogonal for a user switch.
        regards, tom lane

Re: sequences and pg_upgrade

From

Bruce Momjian

Date:

30 August 2016, 15:27:11

On Tue, Aug 30, 2016 at 08:46:48AM -0400, Peter Eisentraut wrote:
> I think the other solution mentioned in that thread would also work:
> Have pg_upgrade treat sequences more like system catalogs, whose format
> changes between major releases, and transferred them via the
> dump/restore route.  So instead of copying the disk files, issue a
> setval call, and the sequence should be all set up.
> 
> Am I missing anything?

Looks straight-forward to me.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: sequences and pg_upgrade

From

Andres Freund

Date:

30 August 2016, 18:15:26

On 2016-08-30 08:46:48 -0400, Peter Eisentraut wrote:
> I was toying with a couple of ideas that would involve changing the
> storage of sequences.  (Say, for the sake of discussion, removing the
> problematic/useless sequence_name field.)

I'd be quite interested to know what changes that are...


> I think the other solution mentioned in that thread would also work:
> Have pg_upgrade treat sequences more like system catalogs, whose format
> changes between major releases, and transferred them via the
> dump/restore route.  So instead of copying the disk files, issue a
> setval call, and the sequence should be all set up.

+1.

Re: sequences and pg_upgrade

From

Anastasia Lubennikova

Date:

14 September 2016, 13:53:10

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, failed

Thank you for the patch.
As I see there are no objections in the discussion, all the patches look clear.

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+    if (dopt.schemaOnly && dopt.sequence_data)
+        getSequenceData(&dopt, tblinfo, numTables, dopt.oids);

Example: 
postgres=# create table t(i serial, data text);
postgres=# insert into t(data) values ('aaa');
pg_dump -d postgres --sequence-data --schema-only > ../reviews/dump_pg

Then restore it into newdb and add new value.
newdb=# insert into t(data) values ('aaa');
INSERT 0 1
newdb=# select * from t;i | data 
---+------2 | aaa

I'm not an experienced user, but I thought that while doing dump/restore
of schema of database we reset all the data. Why should the table in newly
created (via pg_restore) database have non-default sequence value?

I also did some other tests and all of them were passed.

One more thing to do is a documentation for the new option.
You should update help() function in pg_dump.c and also add some
notes to pg_dump.sgml and probably to pgupgrade.sgml.

The new status of this patch is: Waiting on Author

Re: sequences and pg_upgrade

From

Peter Eisentraut

Date:

15 September 2016, 12:29:55

On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:
> Could you clarify, please, why do we dump sequence in schemaOnly mode?
> +    if (dopt.schemaOnly && dopt.sequence_data)
> +        getSequenceData(&dopt, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema.  This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: sequences and pg_upgrade

From

Anastasia Lubennikova

Date:

15 September 2016, 13:00:00

15.09.2016 15:29, Peter Eisentraut:
> On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:
>> Could you clarify, please, why do we dump sequence in schemaOnly mode?
>> +    if (dopt.schemaOnly && dopt.sequence_data)
>> +        getSequenceData(&dopt, tblinfo, numTables, dopt.oids);
> The point of this patch is that with the new option, you can dump
> sequence data (but not table data) alongside with the schema.  This
> would be used by pg_upgrade for the reasons described at the beginning
> of the thread.
>

Oh, thank you. Now I see.
Somewhy I thought that it *always* dumps sequence data in schemaOnly mode.
Which is definitely not true.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: sequences and pg_upgrade

From

Peter Eisentraut

Date:

23 September 2016, 18:07:00

Here is an updated patch set.  Compared to the initial set, I have
changed pg_dump's sorting priorities so that sequence data is always
after table data.  This would otherwise have introduced a problem
because sortDataAndIndexObjectsBySize() only considers consecutive
DO_TABLE_DATA entries.  Also, I have removed the separate
--sequence-data switch from pg_dump and made it implicit in
--binary-upgrade.  (So the previous patches 0002 and 0003 have been
combined, because it's no longer a separate feature.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: sequences and pg_upgrade

From

Anastasia Lubennikova

Date:

30 September 2016, 16:50:26

23.09.2016 21:06, Peter Eisentraut:
> Here is an updated patch set.  Compared to the initial set, I have
> changed pg_dump's sorting priorities so that sequence data is always
> after table data.  This would otherwise have introduced a problem
> because sortDataAndIndexObjectsBySize() only considers consecutive
> DO_TABLE_DATA entries.  Also, I have removed the separate
> --sequence-data switch from pg_dump and made it implicit in
> --binary-upgrade.  (So the previous patches 0002 and 0003 have been
> combined, because it's no longer a separate feature.)
>

The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this 
option,
because it's quite non intuitive behaviour... /* dump sequence data even in schema-only mode */

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: sequences and pg_upgrade

From

Michael Paquier

Date:

03 October 2016, 04:15:46

On Sat, Oct 1, 2016 at 1:50 AM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> 23.09.2016 21:06, Peter Eisentraut:
>>
>> Here is an updated patch set.  Compared to the initial set, I have
>> changed pg_dump's sorting priorities so that sequence data is always
>> after table data.  This would otherwise have introduced a problem
>> because sortDataAndIndexObjectsBySize() only considers consecutive
>> DO_TABLE_DATA entries.  Also, I have removed the separate
>> --sequence-data switch from pg_dump and made it implicit in
>> --binary-upgrade.  (So the previous patches 0002 and 0003 have been
>> combined, because it's no longer a separate feature.)
>>
>
> The patches are good, no complaints.
> But again, I have the same question.
> I was confused, why do we always dump sequence data,
> because I'd overlooked the --sequence-data key. I'd rather leave this
> option,
> because it's quite non intuitive behaviour...
>  /* dump sequence data even in schema-only mode */

Moved to next CF. This is fresh.
-- 
Michael

Re: sequences and pg_upgrade

From

Peter Eisentraut

Date:

31 October 2016, 12:54:00

On 9/30/16 12:50 PM, Anastasia Lubennikova wrote:
> The patches are good, no complaints.
> But again, I have the same question.
> I was confused, why do we always dump sequence data,
> because I'd overlooked the --sequence-data key. I'd rather leave this
> option,
> because it's quite non intuitive behaviour...
>   /* dump sequence data even in schema-only mode */

Here are rebased patches.

Regarding your question:  The initial patch had a separate option for
this behavior, which was then used by pg_upgrade.  It was commented that
this option is not useful outside of pg_upgrade, so it doesn't need to
be exposed as a user-facing option.  I agreed with that and removed the
option.  We can always add the option back easily if someone really
wants it, but so far no use case has been presented.  So I suggest we
proceed with this proposal ignoring whether this option is exposed or not.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: sequences and pg_upgrade

From

Michael Paquier

Date:

02 November 2016, 06:34:28

On Mon, Oct 31, 2016 at 9:53 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/30/16 12:50 PM, Anastasia Lubennikova wrote:
>> The patches are good, no complaints.
>> But again, I have the same question.
>> I was confused, why do we always dump sequence data,
>> because I'd overlooked the --sequence-data key. I'd rather leave this
>> option,
>> because it's quite non intuitive behaviour...
>>   /* dump sequence data even in schema-only mode */
>
> Here are rebased patches.
>
> Regarding your question:  The initial patch had a separate option for
> this behavior, which was then used by pg_upgrade.  It was commented that
> this option is not useful outside of pg_upgrade, so it doesn't need to
> be exposed as a user-facing option.  I agreed with that and removed the
> option.  We can always add the option back easily if someone really
> wants it, but so far no use case has been presented.  So I suggest we
> proceed with this proposal ignoring whether this option is exposed or not.

I had a look at those fresh patches, and 0001 looks like a good thing.
This makes the separation between sequences and table data dump
cleaner. I ran some tests with pg_upgrade and 0002, and things are
clear. And +1 for the way done in the patch, aka no options of pg_dump
exposed to user, still keep the option tracking as a separate value.

One small thing here:static void
-getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids)
+getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables,
bool oids, char relkind){   int         i;
   for (i = 0; i < numTables; i++)   {
-       if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA)
+       if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA &&
+           (!relkind || tblinfo[i].relkind == relkind))           makeTableDataInfo(dopt, &(tblinfo[i]), oids)

One idea here would be to have an extra routine, getSequenceData and
not extend getTableData() with relkind as extra argument. I am fine
with the way patch does things, so I just switched the patch as ready
for committer.
-- 
Michael

Re: sequences and pg_upgrade

From

Peter Eisentraut

Date:

14 November 2016, 02:51:39

On 11/2/16 2:34 AM, Michael Paquier wrote:
> I had a look at those fresh patches, and 0001 looks like a good thing.
> This makes the separation between sequences and table data dump
> cleaner. I ran some tests with pg_upgrade and 0002, and things are
> clear. And +1 for the way done in the patch, aka no options of pg_dump
> exposed to user, still keep the option tracking as a separate value.

Committed, thanks.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services