Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers
| From | David E. Wheeler |
|---|---|
| Subject | Re: Careful PL/Perl Release Not Required |
| Date | |
| Msg-id | D08A9F21-8162-4891-975D-C8F51737181A@kineticode.com Whole thread Raw |
| In response to | Re: Careful PL/Perl Release Not Required (Alex Hunsaker <badalex@gmail.com>) |
| Responses |
Re: Careful PL/Perl Release Not Required
|
| List | pgsql-hackers |
On Feb 10, 2011, at 5:28 PM, Alex Hunsaker wrote:
> Hrm? For UTF-8 databases, in practice, nothing should have changed--
> we already passed strings in as utf8. What I fixed was some corner
> cases where some strings did not always have character semantics. See
> The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode
> for the problem and more or less how I fixed it.
Uh…
try=# create function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift)';
CREATE FUNCTION
try=# select is_utf8('whatever');is_utf8
─────────t
(1 row)
try=# select is_utf8(U&'\0441\043B\043E\043D');is_utf8
─────────t
(1 row)
Damn, I guess you're right. How did I miss that?
> The other thing that changed is non UTF-8 databases now also get
> character semantics. That is we convert from the database encoding
> into utf8 and visa versa on output. That probably should be noted
> somewhere...
Oh. I see. And Oleg's database wasn't utf-8 then, I guess. I'll have to re-read the JSON docs, I guess. Erm…feh. Okay.
Ihave to pass the false value to utf8() *now*. Okay, at least that's more consistent.
> If you do have to change your semantics/functions, could you post an
> example? I'd like to make sure its because you were hitting one of
> those nasty corner cases and not something new is broken.
I think that people who have non-utf-8 databases might be surprised.
>> This probably won't be that common, but Oleg, for example, will need to convert his fixed function from:
>> ...
>
> Well assuming he fixed his bug by encoding uri_unescape's output he
> should not have to do anything. IIRC the problem was basically double
> encoded utf8, not a postgres bug.
No, the problem was that the string was passed to his pl/perl function encoded in utf-8. He added a line to decode it
toPerl's internal form. Once he goes to 9.1, unless the database is SQL_ASCII, he can dump the decode() line. I think.
> [ he had %3A%4A or something, uri_decode() decodes that to _two_
> characters because _it_ knows nothing about utf8. so you would need to
> call utf8::decode() on the result to turn those two bytes into a
> character ]
No, he had to add the decode line, IIRC:
CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$use strict;use
URI::Escape;utf8::decode($_[0]);returnuri_unescape($_[0]); $$ LANGUAGE plperlu;
Because uri_unescape() needs its argument to be decoded to Perl's internal form. On 9.1, it will be, so he won't need
tocall utf8::decode(). That is, in a latin-1 database:
latin=# create or replace function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift) ? 1 : 0';
CREATE FUNCTION
Time: 1.934 ms
latin=# select is_utf8('whatever'); is_utf8
─────────f
(1 row)
That will change, if I understand correctly.
>> So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter
passedin bytes, it will >need to be modified to deal with utf8 strings, instead.
>
> FYI Andrew did add some docs.
Yeah, I was thinking of the release notes. Those who have non-uft-8 databases might be surprised if their PL/Perl
functionsexpect strings to be passed as bytes.
> Thanks for keeping a sharp eye out.
>
> [ P.S. This stuff is confusing as hell, im just glad I got a sucker to
> commit it *waves* at Andrew :-) ]
Heh, well done. Frankly, though, this stuff isn't *that* hard. It's Perl's terminology that's really bad.
Best,
David
pgsql-hackers by date: