On Fri, 9 Oct 2009, Tom Lane wrote:
> what do we do with rows that fail encoding conversion?  For logging to a 
> file we could/should just decree that we write out the original, 
> allegedly-in-the-client-encoding data.  I'm not sure what we do about 
> logging to a table though.  The idea of storing bytea is pretty 
> unpleasant but there might be little choice.
I think this detail can get punted as documented and the error logged, but 
not actually handled perfectly.  In most use cases I've seen here, saving 
the rows to the "reject" file/table is a convenience rather than a hard 
requirement anyway.  You can always dig them back out of the original 
again if you see an encoding error in the logs, and it's rare you can 
completely automate that anyway.
The main purpose of the reject file/table is to accumulate things you 
might fix by hand or systematic update (i.e. add ",\N" for a missing 
column when warranted) before trying a re-import for review.  I suspect 
the users of this feature would be OK with knowing that can't be 100% 
accurate in the face of encoding errors.  It's more important that in the 
usual case, things like bad delimiters and missing columns, that you can 
easily manipulate the rejects as simple text.  Making that harder just for 
this edge case wouldn't match the priorities of the users of this feature 
I've encountered.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD