Home > mailing lists

Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers

From	Alex Hunsaker
Subject	Re: Careful PL/Perl Release Not Required
Date	February 10, 2011 21:28:54
Msg-id	AANLkTimSZFNcVKK0gthChUZ-WcB2s+mPjHzc=U4Vn8Li@mail.gmail.com Whole thread Raw
In response to	Careful PL/Perl Release Not Required ("David E. Wheeler" <david@kineticode.com>)
Responses	Re: Careful PL/Perl Release Not Required
List	pgsql-hackers

Tree view

On Thu, Feb 10, 2011 at 16:28, David E. Wheeler <david@kineticode.com> wrote:
> Hackers,
>
> With regard to this (very welcome) commit:
>
>> commit 50d89d422f9c68a52a6964e5468e8eb4f90b1d95
>> Author: Andrew Dunstan <andrew@dunslane.net>
>> Date:   Sun Feb 6 17:29:26 2011 -0500
>>
>>     Force strings passed to and from plperl to be in UTF8 encoding.
>>
>>     String are converted to UTF8 on the way into perl and to the
>>     database encoding on the way back. This avoids a number of
>>     observed anomalies, and ensures Perl a consistent view of the
>>     world.
>>
>>     Some minor code cleanups are also accomplished.
>>
>>     Alex Hunsaker, reviewed by Andy Colson.
>
> I just want to emphasize that this needs to be highlighted as a compatibility change in the release notes. As an
example,I currently have this code in PGXN to process a TEXT param to a function: 
>
>    my $dist_meta = JSON::XS->new->utf8->decode(shift);
>
> After I upgrade to 9.0, I will have to change that to:
>
>    my $dist_meta = JSON::XS->new->utf8(0)->decode(shift);

Hrm? For UTF-8 databases, in practice, nothing should have changed--
we already passed strings in as utf8. What I fixed was some corner
cases where some strings did not always have character semantics. See
The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode
for the problem and more or less how I fixed it.

The other thing that changed is non UTF-8 databases now also get
character semantics. That is we convert from the database encoding
into utf8 and visa versa on output. That probably should be noted
somewhere...

If you do have to change your semantics/functions, could you post an
example? I'd like to make sure its because you were hitting one of
those nasty corner cases and not something new is broken.

> This probably won't be that common, but Oleg, for example, will need to convert his fixed function from:
> ...

Well assuming he fixed his bug by encoding uri_unescape's output he
should not have to do anything.  IIRC the problem was basically double
encoded utf8, not a postgres bug.

[ he had %3A%4A or something, uri_decode() decodes that to _two_
characters because _it_ knows nothing about utf8. so you would need to
call utf8::decode() on the result to turn those two bytes into a
character ]

> So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter
passedin bytes, it will >need to be modified to deal with utf8 strings, instead. 

FYI Andrew did add some docs.

Thanks for keeping a sharp eye out.

[ P.S. This stuff is confusing as hell, im just glad I got a sucker to
commit it *waves* at Andrew :-) ]

pgsql-hackers by date:

From: Robert Haas
Date: 10 February 2011, 21:17:51
Subject: Re: Sorting. When?

From: Alex Hunsaker
Date: 10 February 2011, 22:06:35
Subject: Re: pl/python invalidate functions with composite arguments

Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers

Previous

Next