Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 |
Date | |
Msg-id | CA+hUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse=ffPA@mail.gmail.com Whole thread Raw |
In response to | Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 (Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>) |
Responses |
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
|
List | pgsql-bugs |
On Thu, Sep 5, 2024 at 11:46 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > On Thu, Sep 5, 2024 at 5:46 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> Really what I'm looking for is (1) feedback on the approach, code and >> comments, and thoughts about more complex scenarios I may have failed >> to think about, including say, pg_dump, pg_upgrade etc operational >> issues, which probably involves lots of previous experience with >> PostgreSQL, (2) opinions on whether we should add a test for these >> cases and how to put the UTF-8 into a script (I'm confused about the >> encoding of command line arguments), and (3) a nod from the EDB people >> involved in distributing this software on Windows. If I don't hear any objections to this plan soon, I'm going to commit this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the upcoming code freeze for the PostgreSQL 17 release ends. So it'll probably be in 16.5 and 17.1. > We can help with producing the builds with the patches provided. You had also > mentioned about the changes required in the installer script, will it still be required? If you don't change the installer script, then it will still fail if someone selects "Türkiye" in your GUI, but now it will fail with an ERROR rejecting non-ASCII characters, instead of crashing. So people in Türkiye, Côte d'Ivoire, Curaçao etc will still have no way to initialise a cluster with your GUI in PostgreSQL 16.5 and 17.1 unless they follow the instructions on the web to create a "Turkey" (or whatever non-ASCII string they want). Of course they could always use initdb.exe directly from the command line with a BCP47 name. Maybe that's OK, but I think you should consider changing the installer. A conservative way to do it would be to show all the existing options that you have now (so that someone who is happy using the old style names when they don't contain non-ASCII can keep doing so), but also have a second entry for each country that shows "Turkish, Türkiye (tr-TR)" and/or perhaps "Turkish, Türkiye (tr-TR.UTF-8)" or perhaps both, and passes just that part in parentheses to initdb, to give users all the options. Or perhaps you could have a checkbox "BCP 47 locales" that changes the list to show them. No one has really reported any real world experience choosing between the tr-TR vs tr-TR.UTF-8 alternatives, and you might like to experiment with that. The second option makes Windows' system libraries use UTF-8 encoding instead of the traditional encoding associated with the language. As far as I can tell, it doesn't make any difference at all to PostgreSQL yet, because your installer always uses --encoding="UTF8" and, on Windows only, that makes PostgreSQL ignore the locale's encoding and do a whole lot of internal conversation to wchar_t because PostgreSQL doesn't yet know that Windows 10+ can work with UTF-8 directly. The reason that I am interested in this .UTF-8-or-not question is that I'd like to consider *disallowing* non-matching encodings (see commitfest entry #3772, reviewers wanted!), and teaching PostgreSQL that Windows does in fact have UTF-8, just so we can delete a lot of slow special case code, harmonise with Unix, and generally catch up with reality. So I figure we might as well start encouraging the "xx-XX.UTF-8" names when using --encoding="UTF8" if we can't find any downside, because under that plan it would eventually become illegal to use --locale="tr-TR" (no .UTF-8) with --encoding="UTF-8" if that eventually goes in, so it seems sensible to stop creating new clusters that way ASAP so that users have a better time upgrading in the future. For example, a pg_upgrade from a PostgreSQL 17 cluster initialised with --locale="tr-TR" --encoding="UTF8" to PostgreSQL 18 would proabbly require some extra step to rename "tr-TR" to "tr-TR.UTF8" at some point (not sure exactly where), if PostgreSQL 18 starts rejecting the non-matching combination. I don't know where that'll go, though -- it's not high priority work, it's just incremental cleanup and modernisation that practically suggests itself whenever looking at rejigging locale code for thread-safety and reading all those comments about wchar_t that are not true.
pgsql-bugs by date: