Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers
From | Alex Hunsaker |
---|---|
Subject | Re: Careful PL/Perl Release Not Required |
Date | |
Msg-id | AANLkTimSZFNcVKK0gthChUZ-WcB2s+mPjHzc=U4Vn8Li@mail.gmail.com Whole thread Raw |
In response to | Careful PL/Perl Release Not Required ("David E. Wheeler" <david@kineticode.com>) |
Responses |
Re: Careful PL/Perl Release Not Required
|
List | pgsql-hackers |
On Thu, Feb 10, 2011 at 16:28, David E. Wheeler <david@kineticode.com> wrote: > Hackers, > > With regard to this (very welcome) commit: > >> commit 50d89d422f9c68a52a6964e5468e8eb4f90b1d95 >> Author: Andrew Dunstan <andrew@dunslane.net> >> Date: Sun Feb 6 17:29:26 2011 -0500 >> >> Force strings passed to and from plperl to be in UTF8 encoding. >> >> String are converted to UTF8 on the way into perl and to the >> database encoding on the way back. This avoids a number of >> observed anomalies, and ensures Perl a consistent view of the >> world. >> >> Some minor code cleanups are also accomplished. >> >> Alex Hunsaker, reviewed by Andy Colson. > > I just want to emphasize that this needs to be highlighted as a compatibility change in the release notes. As an example,I currently have this code in PGXN to process a TEXT param to a function: > > my $dist_meta = JSON::XS->new->utf8->decode(shift); > > After I upgrade to 9.0, I will have to change that to: > > my $dist_meta = JSON::XS->new->utf8(0)->decode(shift); Hrm? For UTF-8 databases, in practice, nothing should have changed-- we already passed strings in as utf8. What I fixed was some corner cases where some strings did not always have character semantics. See The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode for the problem and more or less how I fixed it. The other thing that changed is non UTF-8 databases now also get character semantics. That is we convert from the database encoding into utf8 and visa versa on output. That probably should be noted somewhere... If you do have to change your semantics/functions, could you post an example? I'd like to make sure its because you were hitting one of those nasty corner cases and not something new is broken. > This probably won't be that common, but Oleg, for example, will need to convert his fixed function from: > ... Well assuming he fixed his bug by encoding uri_unescape's output he should not have to do anything. IIRC the problem was basically double encoded utf8, not a postgres bug. [ he had %3A%4A or something, uri_decode() decodes that to _two_ characters because _it_ knows nothing about utf8. so you would need to call utf8::decode() on the result to turn those two bytes into a character ] > So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter passedin bytes, it will >need to be modified to deal with utf8 strings, instead. FYI Andrew did add some docs. Thanks for keeping a sharp eye out. [ P.S. This stuff is confusing as hell, im just glad I got a sucker to commit it *waves* at Andrew :-) ]
pgsql-hackers by date: