Re: finding bogus UTF-8 - Mailing list pgsql-general

From Marko Kreen
Subject Re: finding bogus UTF-8
Date
Msg-id AANLkTi==bd28k_J2=Dg0kcLD_mMTrLByCGrS+PHk1U-s@mail.gmail.com
Whole thread Raw
In response to finding bogus UTF-8  (Scott Ribe <scott_ribe@elevated-dev.com>)
List pgsql-general
On Thu, Feb 10, 2011 at 9:02 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> I know that I have at least one instance of a varchar that is not valid UTF-8, imported from a source with errors
(AMACPT files, actually) before PG's checking was as stringent as it is today. Can anybody suggest a query to find such
values?

CREATE OR REPLACE FUNCTION is_utf8(text)
RETURNS bool AS $$
try:
    args[0].decode('utf8')
    return True
except UnicodeDecodeError:
    return False
$$ LANGUAGE plpythonu STRICT;

--
marko

pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Speeding up index scans by truncating timestamp?
Next
From: Vick Khera
Date:
Subject: Re: finding bogus UTF-8