Re: finding bogus UTF-8 - Mailing list pgsql-general

From Geoffrey Myers
Subject Re: finding bogus UTF-8
Date
Msg-id 4D5AF8CF.9080001@serioustechnology.com
Whole thread Raw
In response to Re: finding bogus UTF-8  (Vick Khera <vivek@khera.org>)
Responses Re: finding bogus UTF-8
List pgsql-general
Vick Khera wrote:
> On Tue, Feb 15, 2011 at 11:09 AM, Geoffrey Myers
> <lists@serioustechnology.com> wrote:
>> comments would be appreciated.
>>
>
> If all you're doing is filtering stdin to stdout and deleting a range
> of characters, it seems that tr would be a faster tool:
>
> cat foo.txt | tr -d '\000-\008\013-\037\177-\377' > foo-cleaned.txt

I toyed with tr for a bit, but could not get it to work.  The above did
not work for me either.  Not exactly sure what it's doing, but here's a
couple of diff lines:


1619c1619
<     days integer DEFAULT 28,
---
 >     days integer DEFAULT 2,


So it appears 'tr' is deleting the '8' character, rather then the octal
value for 008.


--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

pgsql-general by date:

Previous
From: Alpha Beta
Date:
Subject: subset of attributes
Next
From: Merlin Moncure
Date:
Subject: Re: SELECT INTO array[i] with PL/pgSQL