Re: Careful PL/Perl Release Not Required - Mailing list pgsql-hackers
From | Alex Hunsaker |
---|---|
Subject | Re: Careful PL/Perl Release Not Required |
Date | |
Msg-id | AANLkTimkORLgN6ib63rkZ9OjZv5jDpBpe8E+OEk-oXL-@mail.gmail.com Whole thread Raw |
In response to | Re: Careful PL/Perl Release Not Required ("David E. Wheeler" <david@kineticode.com>) |
Responses |
Re: Careful PL/Perl Release Not Required
Re: Careful PL/Perl Release Not Required |
List | pgsql-hackers |
On Fri, Feb 11, 2011 at 10:16, David E. Wheeler <david@kineticode.com> wrote: > On Feb 10, 2011, at 11:43 PM, Alex Hunsaker wrote: > Like I said, the terminology is awful. Yeah I use encode and decode to mean the same thing frequently :-(. >> In the the cited case he was passing "%C3%A9" to uri_unescape() and >> expecting it to return 1 character. The additional utf8::decode() will >> tell perl the string is in utf8 so it will then return 1 char. The >> point being, decode is needed and with it, the function will work pre >> and post 9.1. > > Why wouldn't the string be decoded already when it's passed to the function, as it would be in 9.0 if the database wasutf-8, and should be in 9.1 if the database isn't sql_ascii? It is decoded... the input string "%C3%A9" actually is the _same_ string utf-8, latin1 and SQL_ASCII decoded or not. Those are all ascii characters. Calling utf8::decode("%C3%A9") is essentially a noop. >> In-fact on a latin-1 database it sure as heck better return two >> characters, it would be a bug if it only returned 1 as that would mean >> it would be treating a series of latin1 bytes as a series of utf8 >> bytes! > > If it's a latin-1 database, in 9.1, the argument should be passed decoded. That's not a utf-8 string or bytes. It's Perl'sinternal representation. > If I understand the patch correctly, the decode() will no longer be needed. The string will *already* be decoded. Ok, I think i figured out why we seem to be talking past each other, we have: CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return uri_unescape($_[0]); $$ LANGUAGE plperlu; That *looks* like it is decoding the input string, which it is, but actually that will double utf8 encode your string. It does not seem to in this case because we are dealing with all ascii input. The trick here is its also telling perl to decode/treat the *output* string as utf8. uri_unescape() returns the same string you passed in, which thanks to the utf8::decode() above has the utf8 flag set. Meaning we end up treating it as 1 character instead of two. Or basically that it has the same effect as calling utf8::decode() on the return value. The correct way to write that function pre 9.1 and post 9.1 would be (in a utf8 database): CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; my $str = uri_unescape($_[0]); utf8::decode($str); return $str; $$ LANGUAGE plperlu; The last utf8::decode being optional (as we said, it might not be utf8), but granting the sought behavior by the op.
pgsql-hackers by date: