Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers
From | David E. Wheeler |
---|---|
Subject | Re: Careful PL/Perl Release Not Required |
Date | |
Msg-id | D08A9F21-8162-4891-975D-C8F51737181A@kineticode.com Whole thread Raw |
In response to | Re: Careful PL/Perl Release Not Required (Alex Hunsaker <badalex@gmail.com>) |
Responses |
Re: Careful PL/Perl Release Not Required
|
List | pgsql-hackers |
On Feb 10, 2011, at 5:28 PM, Alex Hunsaker wrote: > Hrm? For UTF-8 databases, in practice, nothing should have changed-- > we already passed strings in as utf8. What I fixed was some corner > cases where some strings did not always have character semantics. See > The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode > for the problem and more or less how I fixed it. Uh… try=# create function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift)'; CREATE FUNCTION try=# select is_utf8('whatever');is_utf8 ─────────t (1 row) try=# select is_utf8(U&'\0441\043B\043E\043D');is_utf8 ─────────t (1 row) Damn, I guess you're right. How did I miss that? > The other thing that changed is non UTF-8 databases now also get > character semantics. That is we convert from the database encoding > into utf8 and visa versa on output. That probably should be noted > somewhere... Oh. I see. And Oleg's database wasn't utf-8 then, I guess. I'll have to re-read the JSON docs, I guess. Erm…feh. Okay. Ihave to pass the false value to utf8() *now*. Okay, at least that's more consistent. > If you do have to change your semantics/functions, could you post an > example? I'd like to make sure its because you were hitting one of > those nasty corner cases and not something new is broken. I think that people who have non-utf-8 databases might be surprised. >> This probably won't be that common, but Oleg, for example, will need to convert his fixed function from: >> ... > > Well assuming he fixed his bug by encoding uri_unescape's output he > should not have to do anything. IIRC the problem was basically double > encoded utf8, not a postgres bug. No, the problem was that the string was passed to his pl/perl function encoded in utf-8. He added a line to decode it toPerl's internal form. Once he goes to 9.1, unless the database is SQL_ASCII, he can dump the decode() line. I think. > [ he had %3A%4A or something, uri_decode() decodes that to _two_ > characters because _it_ knows nothing about utf8. so you would need to > call utf8::decode() on the result to turn those two bytes into a > character ] No, he had to add the decode line, IIRC: CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$use strict;use URI::Escape;utf8::decode($_[0]);returnuri_unescape($_[0]); $$ LANGUAGE plperlu; Because uri_unescape() needs its argument to be decoded to Perl's internal form. On 9.1, it will be, so he won't need tocall utf8::decode(). That is, in a latin-1 database: latin=# create or replace function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift) ? 1 : 0'; CREATE FUNCTION Time: 1.934 ms latin=# select is_utf8('whatever'); is_utf8 ─────────f (1 row) That will change, if I understand correctly. >> So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter passedin bytes, it will >need to be modified to deal with utf8 strings, instead. > > FYI Andrew did add some docs. Yeah, I was thinking of the release notes. Those who have non-uft-8 databases might be surprised if their PL/Perl functionsexpect strings to be passed as bytes. > Thanks for keeping a sharp eye out. > > [ P.S. This stuff is confusing as hell, im just glad I got a sucker to > commit it *waves* at Andrew :-) ] Heh, well done. Frankly, though, this stuff isn't *that* hard. It's Perl's terminology that's really bad. Best, David
pgsql-hackers by date: