Thread: Preserve versions of initdb-created collations in pg_upgrade
As mentioned in [0], pg_upgrade currently does not preserve the version of collation objects created by initdb. Here is an attempt to fix that. The way I deal with this here is by having the binary-upgrade mode in pg_dump delete all the collations created by initdb and then dump out CREATE COLLATION commands with version information normally. I had originally imagined doing some kind of ALTER COLLATION (or perhaps a direct UPDATE pg_collation) to update the version information, but that doesn't really work because we don't know whether the collation object with a given name in the new cluster is the same as the one in the old cluster. So it seems more robust to just delete all existing collations and create them from scratch. Thoughts? [0]: https://www.postgresql.org/message-id/CA+hUKGKDe98DFWKJoS7e4Z+Oamzc-1sZfpL3V3PPgi1uNvQ1tw@mail.gmail.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Oct 29, 2019 at 1:52 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > As mentioned in [0], pg_upgrade currently does not preserve the version > of collation objects created by initdb. Here is an attempt to fix that. > > The way I deal with this here is by having the binary-upgrade mode in > pg_dump delete all the collations created by initdb and then dump out > CREATE COLLATION commands with version information normally. This seems to be basically OK. It does mean that the target database has collation OIDs >= FirstNormalObjectId. That is, they don't look like initdb-created objects, which is OK because they aren't, I'm just highlighting this to see if anyone else sees a problem with it. Suppose you pg_upgrade again: now you'll dump these collations just as you did the first time around, because they look exactly like user-defined collations. It also means that if you pg_upgrade to a target cluster created by a build without ICU we'll try to create ICU collations and that'll fail ("ICU is not supported in this build"), whereas before if had ICU collations and didn't ever make use of them, you'd be able to do such an upgrade; again this doesn't seem like a major problem, it's just an observation about an edge case. One more thing to note is if you upgrade from 12 to 13 on a glibc system, I think we'll automatically pick up the *current* version when creating the collations in the target DB, which seems to be OK but it is a choice to default to assuming that the database's indexes are not corrupted. Another observation is that you finish up with different OIDs in each database you upgrade, which again doesn't seem like a problem in itself. It is slightly odd that template1 finishes up with the old initdb's template1 collatoins, rather than the new initdb's opinion of the current set of collations, but I am not sure if it's a problem. I think it has to be like that, because you might have created other stuff that depends on those collations in your source template1 database, and so you have to preserve the versions. > I had originally imagined doing some kind of ALTER COLLATION (or perhaps > a direct UPDATE pg_collation) to update the version information, but > that doesn't really work because we don't know whether the collation > object with a given name in the new cluster is the same as the one in > the old cluster. So it seems more robust to just delete all existing > collations and create them from scratch. > > Thoughts? Seems to work as described with -E UTF-8, but it fails with clusters using -E SQL_ASCII. That causes the pg_upgrade check to fail on machines where that is the default encoding chosen by initdb (where unpatched master succeeds): pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro pg_restore: error: could not execute query: ERROR: collation "pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro;
On 2019-10-29 03:33, Thomas Munro wrote: > Seems to work as described with -E UTF-8, but it fails with clusters > using -E SQL_ASCII. That causes the pg_upgrade check to fail on > machines where that is the default encoding chosen by initdb (where > unpatched master succeeds): > > pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu" > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro > pg_restore: error: could not execute query: ERROR: collation > "pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist > Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro; This could be addressed by using is_encoding_supported_by_icu() in pg_dump to filter out collations with unsupported encodings. However, the more I look at this whole problem, I'm wondering whether it wouldn't be preferable to avoid this whole mess by just not creating any collations in initdb. What do you think? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Dec 21, 2019 at 7:38 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-10-29 03:33, Thomas Munro wrote: > > Seems to work as described with -E UTF-8, but it fails with clusters > > using -E SQL_ASCII. That causes the pg_upgrade check to fail on > > machines where that is the default encoding chosen by initdb (where > > unpatched master succeeds): > > > > pg_restore: creating COLLATION "pg_catalog.af-NA-x-icu" > > pg_restore: while PROCESSING TOC: > > pg_restore: from TOC entry 1700; 3456 12683 COLLATION af-NA-x-icu tmunro > > pg_restore: error: could not execute query: ERROR: collation > > "pg_catalog.af-NA-x-icu" for encoding "SQL_ASCII" does not exist > > Command was: ALTER COLLATION pg_catalog."af-NA-x-icu" OWNER TO tmunro; > > This could be addressed by using is_encoding_supported_by_icu() in > pg_dump to filter out collations with unsupported encodings. > > However, the more I look at this whole problem, I'm wondering whether it > wouldn't be preferable to avoid this whole mess by just not creating any > collations in initdb. What do you think? I think this problem goes away if we commit the per-object collation version patch set[1]. It drops the collversion column, and Julien's recent versions handle pg_upgrade quite well, as long as a collation by the same name exists in the target cluster. In that universe, if initdb didn't create them, we'd have to tell people to create all necessary collations manually before doing a pg_upgrade into it, and that doesn't seem great. Admittedly there might be some weird cases where a collation is somehow completely different but has the same name. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
On 2019-12-21 09:01, Thomas Munro wrote: > I think this problem goes away if we commit the per-object collation > version patch set[1]. It drops the collversion column, and Julien's > recent versions handle pg_upgrade quite well, as long as a collation > by the same name exists in the target cluster. In that universe, if > initdb didn't create them, we'd have to tell people to create all > necessary collations manually before doing a pg_upgrade into it, and > that doesn't seem great. Admittedly there might be some weird cases > where a collation is somehow completely different but has the same > name. Setting this patch to Returned with Feedback. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services