RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 - Mailing list pgsql-bugs

From Haifang Wang (Centific Technologies Inc)
Subject RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date
Msg-id PH8PR21MB3902991F916D365183F8AE61E59C2@PH8PR21MB3902.namprd21.prod.outlook.com
Whole thread Raw
In response to RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607  ("Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com>)
Responses Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
List pgsql-bugs
Hi Thomas,

How is it going after all the analysis below? 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Thursday, August 29, 2024 8:22 PM
To: Andrew Dunstan <andrew@dunslane.net>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;
ShawnSteele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Fri, Aug 30, 2024 at 10:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> > On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> > > initdb: error: locale name "liberté" contains non-ASCII characters
> >
> > +1 for doing this.
>
> Thanks for looking!  OK, let's go with this approach.  That two-line 
> demo patch doesn't work properly for the system default, and of course 
> it should be in sync with the backend function, and needs comments to 
> explain this mess, so I'll work on a serious patch.

Done.  To develop this patch, I set up a cursed locale name on my FreeBSD box like so:

$ sudo cp -r /usr/share/locale/fr_FR.UTF-8 /usr/share/locale/fraternité.UTF-8

Then I could exercise the setlocale("") path, approximating what I expect a Turkish Windows system to do without
explicit--locale:
 

$ LC_TIME="fraternité.UTF-8" initdb -D pdata ...
initdb: error: locale name "fraternité.UTF-8" contains non-ASCII characters

The explicit path is easier to test, like before:

$ initdb -D pgdata --locale "égalité.UTF-9"
...
initdb: error: locale name "égalité.UTF-9" contains non-ASCII characters

That's what the EDB installer (if not changed to provide ASCII-clean names somehow or other) will start failing with,
ifit tries to feed umlauts to initdb.
 

Working through that, I understood another problem: even if you provide an acceptable string like --locale "tr-TR", we
stillneed to be able to save-and-restore the value that initdb inherited from the environment.  Using my simulation,
whatI mean is if you do:
 

$ export LANG="liberté.UTF-8"

... then ...

$ initdb -D pgdata --locale "fr_FR.UTF-8"

... it still needs to be able to do a safe save-and-restore back to the environment's value without crashing, even
thoughit's not really using that value.  I wondered if initdb really needs a locale other than "C" for itself, but I'm
tryingto minimise the changes here so I
 
don't break something else...   That seems to mean that we have to use
_wsetlocale() for that particular save/restore.  This is essentially what Haifang's colleagues were suggesting
upthread. Done.
 

Once a server is up and running, we shouldn't be exposed to the server environment (right?), but there are several more
placesthat accept user-supplied strings, and only CREATE COLLATION fails to run
 
check_locale():

postgres=# create collation foo (locale = 'fraternité.UTF-8'); CREATE COLLATION

The other places I am aware of do, so putting the equivalent code into
check_locale() does the right thing:

postgres=# create database db locale = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid LC_COLLATE locale name: "fraternité.UTF-8"
HINT:  If the locale name is specific to ICU, use ICU_LOCALE.

postgres=# set lc_time = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_time": "fraternité.UTF-8"

postgres=# set lc_messages = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_messages": "fraternité.UTF-8"

postgres=# set lc_monetary = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_monetary": "fraternité.UTF-8"

Those last SET lc_* changes aren't strictly necessary, because we never restore to those, we only switch to them and
thenrestore to the database default, but since they share code with CREATE DATABASE the new check kicks in anyway.
 

I tried adding check_locale() checks into CREATE COLLATION, but that made some tests fail (huh, I didn't know that the
setof strings acceptable to _newlocale() and setlocale() are different -- the former seems to accept "POSIX" on that
OS,which is exercised by some tests, or maybe it's some path of ours, I didn't check).  But, on reflection, I don't
believethat COLLATION objects actually have this problem anyway, because we never call setlocale() to them, we just use
themodern thread-safe POSIX-2008-oid APIs.  So I took that back out to minimise the patch.
 

Please see attached, including a restatement of the problem and solution in the commit message.

I suppose we could add some tests for this, but I was a bit unsure about the rules for putting non-ASCII into perl
scriptsand command lines, and would need to research that...
 

(Note: a lot of this old stuff is getting ripped out in master by the ongoing multi-threading effort...)

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #18599: server closed the connection unexpectedly
Next
From: Thomas Munro
Date:
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607