Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGKq=iLH3bY+nK7v8b2zBCuKOk-fe0cP0it2RxNaWFVxYA@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Collation version tracking for macOS
Re: Collation version tracking for macOS Re: Collation version tracking for macOS |
List | pgsql-hackers |
On Sat, Oct 22, 2022 at 10:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > ... But it > doesn't provide a way for me to create a new database that uses 63 on > purpose when I know what I'm doing. There are various reasons I might > want to do that. Thinking some more about this, I guess that could be addressed by having an explicit way to request either the library version or collversion-style version when creating a database or collation, but not actually storing it in daticulocale/colliculocale. That could be done either as part of the string that is trimmed off before storing it (so it's only used briefly during creation to find a non-default library)... Perhaps that'd look like initdb --icu-locale "67:en" (ICU library version) or "154.14:en" (individual collation version) or some new syntax in a few places. Thereafter, it would always be looked up by searching for the right library by [dat]collversion as Peter E suggested. Let me try harder to vocalise some more thoughts that have stopped me from trying to code the search-by-collversion design so far: Suppose your pgdata encounters a PostgreSQL linked against a later ICU library, most likely after an OS upgrade or migratoin, a pg_upgrade, or via streaming replication. You might get a new error "can't find ICU collation 'en' with version '153.14'; HINT: install missing ICU library version", and somehow you'll have to work out which one might contain 'en' v153.14 and install it with apt-get etc. Then it'll magically work: your postgres linked against (say) 71 will happily work with the dlopen'd 67. This is enough if you want to stay on 67 until the heat death of the universe. So far so good. Problem 1: Suppose you're ready to start using (say) v72. I guess you'd use the REFRESH command, which would open the main linked ICU's collversion and stamp that into the catalogue, at which point new sessions would start using that, and then you'd have to rebuild all your indexes (with no help from PG to tell you how to find everything that needs to be rebuilt, as belaboured in previous reverted work). Aside from the possibility of getting the rebuilding job wrong (as belaboured elsewhere), it's not great, because there is still a transitional period where you can be using the wrong version for your data. So this requires some careful planning and understanding from the administrator. I admit that the upgrade story is a tiny bit better than the v5 DB2-style patch, which starts using the new version immediately if you didn't use a prefix (and logs the usual warnings about collversion mismatch) instead of waiting for you to run REFRESH. But both of them have a phase where they might use the wrong library to access an index. That's dissatisfying, and leads me to prefer the simple DB2-style solution that at least admits up front that it's not very clever. The DB2-style patch could be improved a bit here with the addition of one more GUC: default_icu_library, so the administrator, rather than the packager, remains in control of which version we use for non-prefixed iculocale values (likely to be what almost everyone is interested in), defaulting to what the packager linked against. I've added that to the patch for illustration (though obviously the error messages produced by collversion mismatch could use some adjustment, ie to clarify that the warning might be cleared by installing and selecting a different library version). Problem 2: If ICU 67 ever decides to report a different version for a given collation (would it ever do that? I don't expect so, but ...), we'd be unable to open the collation with the search-by-collversion design, and potentially the database. What is a user supposed to do then? Presumably our error/hint for that would be "please insert the correct ICU library into drive A", but now there is no correct library; if you can even diagnose what's happened, I guess you might downgrade the ICU library using package tools or whatever if possible, but otherwise you'd be stuck, if you just can't get the right library. Is this a problem? Would you want to be able to say "I don't care, computer, please just press on"? So I think we need a way to turn off the search-by-collversion thing. How should it look? I'd love to hear others' thoughts on how we can turn this into a workable solution. Hopefully while staying simple...
Attachment
pgsql-hackers by date: