Home > mailing lists

Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers

From	David E. Wheeler
Subject	Re: Careful PL/Perl Release Not Required
Date	February 11, 2011 00:53:44
Msg-id	D08A9F21-8162-4891-975D-C8F51737181A@kineticode.com Whole thread Raw
In response to	Re: Careful PL/Perl Release Not Required (Alex Hunsaker <badalex@gmail.com>)
Responses	Re: Careful PL/Perl Release Not Required
List	pgsql-hackers

Tree view

On Feb 10, 2011, at 5:28 PM, Alex Hunsaker wrote:

> Hrm? For UTF-8 databases, in practice, nothing should have changed--
> we already passed strings in as utf8. What I fixed was some corner
> cases where some strings did not always have character semantics. See
> The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode
> for the problem and more or less how I fixed it.

Uh…

try=# create function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift)';
CREATE FUNCTION

try=# select is_utf8('whatever');is_utf8
─────────t
(1 row)

try=# select is_utf8(U&'\0441\043B\043E\043D');is_utf8
─────────t
(1 row)

Damn, I guess you're right. How did I miss that?

> The other thing that changed is non UTF-8 databases now also get
> character semantics. That is we convert from the database encoding
> into utf8 and visa versa on output. That probably should be noted
> somewhere...

Oh. I see. And Oleg's database wasn't utf-8 then, I guess. I'll have to re-read the JSON docs, I guess. Erm…feh. Okay.
Ihave to pass the false value to utf8() *now*. Okay, at least that's more consistent. 

> If you do have to change your semantics/functions, could you post an
> example? I'd like to make sure its because you were hitting one of
> those nasty corner cases and not something new is broken.

I think that people who have non-utf-8 databases might be surprised.

>> This probably won't be that common, but Oleg, for example, will need to convert his fixed function from:
>> ...
>
> Well assuming he fixed his bug by encoding uri_unescape's output he
> should not have to do anything.  IIRC the problem was basically double
> encoded utf8, not a postgres bug.

No, the problem was that the string was passed to his pl/perl function encoded in utf-8. He added a line to decode it
toPerl's internal form. Once he goes to 9.1, unless the database is SQL_ASCII, he can dump the decode() line. I think. 

> [ he had %3A%4A or something, uri_decode() decodes that to _two_
> characters because _it_ knows nothing about utf8. so you would need to
> call utf8::decode() on the result to turn those two bytes into a
> character ]

No, he had to add the decode line, IIRC:

CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar  AS $$use strict;use
URI::Escape;utf8::decode($_[0]);returnuri_unescape($_[0]); $$ LANGUAGE plperlu; 

Because uri_unescape() needs its argument to be decoded to Perl's internal form. On 9.1, it will be, so he won't need
tocall utf8::decode(). That is, in a latin-1 database: 

latin=# create or replace function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift) ? 1 : 0';
CREATE FUNCTION
Time: 1.934 ms
latin=# select is_utf8('whatever');                                                         is_utf8
─────────f
(1 row)

That will change, if I understand correctly.

>> So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter
passedin bytes, it will >need to be modified to deal with utf8 strings, instead. 
>
> FYI Andrew did add some docs.

Yeah, I was thinking of the release notes. Those who have non-uft-8 databases might be surprised if their PL/Perl
functionsexpect strings to be passed as bytes. 

> Thanks for keeping a sharp eye out.
>
> [ P.S. This stuff is confusing as hell, im just glad I got a sucker to
> commit it *waves* at Andrew :-) ]

Heh, well done. Frankly, though, this stuff isn't *that* hard. It's Perl's terminology  that's really bad.

Best,

David

pgsql-hackers by date:

From: Robert Haas
Date: 11 February 2011, 00:01:42
Subject: Re: Spread checkpoint sync

From: "David E. Wheeler"
Date: 11 February 2011, 01:05:02
Subject: Re: ALTER EXTENSION UPGRADE, v3

Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers

Previous

Next