Thread: Preserve versions of initdb-created collations in pg_upgrade

Preserve versions of initdb-created collations in pg_upgrade

From

Peter Eisentraut

Date:

28 October 2019, 12:52:02

As mentioned in [0], pg_upgrade currently does not preserve the version 
of collation objects created by initdb.  Here is an attempt to fix that.

The way I deal with this here is by having the binary-upgrade mode in 
pg_dump delete all the collations created by initdb and then dump out 
CREATE COLLATION commands with version information normally.

I had originally imagined doing some kind of ALTER COLLATION (or perhaps 
a direct UPDATE pg_collation) to update the version information, but 
that doesn't really work because we don't know whether the collation 
object with a given name in the new cluster is the same as the one in 
the old cluster.  So it seems more robust to just delete all existing 
collations and create them from scratch.

Thoughts?


[0]: 
https://www.postgresql.org/message-id/CA+hUKGKDe98DFWKJoS7e4Z+Oamzc-1sZfpL3V3PPgi1uNvQ1tw@mail.gmail.com
-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v1-0001-Preserve-versions-of-initdb-created-collations-in.patch

Re: Preserve versions of initdb-created collations in pg_upgrade

From

Thomas Munro

Date:

29 October 2019, 02:33:12

On Tue, Oct 29, 2019 at 1:52 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> As mentioned in [0], pg_upgrade currently does not preserve the version
> of collation objects created by initdb.  Here is an attempt to fix that.
>
> The way I deal with this here is by having the binary-upgrade mode in
> pg_dump delete all the collations created by initdb and then dump out
> CREATE COLLATION commands with version information normally.

This seems to be basically OK.

It does mean that the target database has collation OIDs >=
FirstNormalObjectId.  That is, they don't look like initdb-created
objects, which is OK because they aren't, I'm just highlighting this
to see if anyone else sees a problem with it.  Suppose you pg_upgrade
again: now you'll dump these collations just as you did the first time
around, because they look exactly like user-defined collations.  It
also means that if you pg_upgrade to a target cluster created by a
build without ICU we'll try to create ICU collations and that'll fail
("ICU is not supported in this build"), whereas before if had ICU
collations and didn't ever make use of them, you'd be able to do such
an upgrade; again this doesn't seem like a major problem, it's just an
observation about an edge case.  One more thing to note is if you
upgrade from 12 to 13 on a glibc system, I think we'll automatically
pick up the *current* version when creating the collations in the target
DB, which seems to be OK but it is a choice to default to assuming
that the database's indexes are not corrupted.  Another observation is
that you finish up with different OIDs in each database you upgrade,
which again doesn't seem like a problem in itself.  It is slightly odd that
template1 finishes up with the old initdb's template1 collatoins, rather
than the new initdb's opinion of the current set of collations, but I am
not sure if it's a problem.  I think it has to be like that, because you
might have created other stuff that depends on those collations in your
source template1 database, and so you have to preserve the versions.

> I had originally imagined doing some kind of ALTER COLLATION (or perhaps
> a direct UPDATE pg_collation) to update the version information, but
> that doesn't really work because we don't know whether the collation
> object with a given name in the new cluster is the same as the one in
> the old cluster.  So it seems more robust to just delete all existing
> collations and create them from scratch.
>
> Thoughts?

Seems to work as described with -E UTF-8, but it fails with clusters
using -E SQL_ASCII.  That causes the pg_upgrade check to fail on
machines where that is the default encoding chosen by initdb (where
unpatched master succeeds):

pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
pg_restore: error: could not execute query: ERROR:  collation
"pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;

Re: Preserve versions of initdb-created collations in pg_upgrade

From

Peter Eisentraut

Date:

21 December 2019, 06:38:16

On 2019-10-29 03:33, Thomas Munro wrote:
> Seems to work as described with -E UTF-8, but it fails with clusters
> using -E SQL_ASCII.  That causes the pg_upgrade check to fail on
> machines where that is the default encoding chosen by initdb (where
> unpatched master succeeds):
> 
> pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
> pg_restore: error: could not execute query: ERROR:  collation
> "pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
> Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;

This could be addressed by using is_encoding_supported_by_icu() in 
pg_dump to filter out collations with unsupported encodings.

However, the more I look at this whole problem, I'm wondering whether it 
wouldn't be preferable to avoid this whole mess by just not creating any 
collations in initdb.  What do you think?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Preserve versions of initdb-created collations in pg_upgrade

From

Thomas Munro

Date:

21 December 2019, 08:01:58

On Sat, Dec 21, 2019 at 7:38 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2019-10-29 03:33, Thomas Munro wrote:
> > Seems to work as described with -E UTF-8, but it fails with clusters
> > using -E SQL_ASCII.  That causes the pg_upgrade check to fail on
> > machines where that is the default encoding chosen by initdb (where
> > unpatched master succeeds):
> >
> > pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu"
> > pg_restore: while PROCESSING TOC:
> > pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro
> > pg_restore: error: could not execute query: ERROR:  collation
> > "pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist
> > Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;
>
> This could be addressed by using is_encoding_supported_by_icu() in
> pg_dump to filter out collations with unsupported encodings.
>
> However, the more I look at this whole problem, I'm wondering whether it
> wouldn't be preferable to avoid this whole mess by just not creating any
> collations in initdb.  What do you think?

I think this problem goes away if we commit the per-object collation
version patch set[1].  It drops the collversion column, and Julien's
recent versions handle pg_upgrade quite well, as long as a collation
by the same name exists in the target cluster.  In that universe, if
initdb didn't create them, we'd have to tell people to create all
necessary collations manually before doing a pg_upgrade into it, and
that doesn't seem great.  Admittedly there might be some weird cases
where a collation is somehow completely different but has the same
name.

[1]
https://www.postgresql.org/message-id/flat/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

Re: Preserve versions of initdb-created collations in pg_upgrade

From

Peter Eisentraut

Date:

24 January 2020, 10:04:25

On 2019-12-21 09:01, Thomas Munro wrote:
> I think this problem goes away if we commit the per-object collation
> version patch set[1].  It drops the collversion column, and Julien's
> recent versions handle pg_upgrade quite well, as long as a collation
> by the same name exists in the target cluster.  In that universe, if
> initdb didn't create them, we'd have to tell people to create all
> necessary collations manually before doing a pg_upgrade into it, and
> that doesn't seem great.  Admittedly there might be some weird cases
> where a collation is somehow completely different but has the same
> name.

Setting this patch to Returned with Feedback.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services