Thread: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
Prasanth Reddy
Date:
-------- Forwarded Message --------
Subject: | Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92 |
---|---|
Date: | Sat, 08 Aug 2015 16:39:17 -0500 |
From: | Prasanth Reddy <dbadmin@nqadmin.com> |
Reply-To: | dbadmin@nqadmin.com |
To: | Bosco Rama <postgres@boscorama.com> |
1. Is the content vetting new in 9.4? 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in notes columns. 3. Is there a way to check the content and fix it/remove invalid characters? 4. Is there an option to force the driver to not update database with invalid characters? Appreciate your help. Thanks, Prasanth ------------------Original Message-------------------------- On 08/08/15 10:49, Prasanth Reddy wrote: > [snip] > 12:42:02.505 (1) <=BE ParameterStatus(client_encoding = UTF8) [snip] > 12:42:02.506 (1) <=BE ParameterStatus(server_encoding = SQL_ASCII) There's your problem. The client is expecting UTF8 while the server is storing un-vetted SQL_ASCII. You will need to change one of the components' choice of encoding and/or enforce content conformance in the data. Another choice, in cases of known and expected non-compliance, is to use bytea casting to bypass the automated content vetting. But your choice in this case is going to be driven by the ease with which you can alter your application vs the server-side changes that would be needed. HTH, Bosco.
Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
John R Pierce
Date:
On 8/8/2015 2:40 PM, Prasanth Reddy wrote: > 1. Is the content vetting new in 9.4? no, PostgreSQL has always been strict about content types. > 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word documentthere will be some characters invalid in UTF8. Most of these should only be in > notes columns. the driver isn't doing this, its happening in the postgres database. you should fix invalid input at the source application before sending to the database. > 3. Is there a way to check the content and fix it/remove invalid characters? define 'invalid character', then fix it. this SQL_ASCII field that contains 0x92, what character code is that supposed to represent ? > 4. Is there an option to force the driver to not update database with invalid characters? the database (NOT the driver) does that already. if you try and insert invalid data, you get an error and have to rollback the transaction. -- john r pierce, recycling bits in santa cruz
Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
Prasanth Reddy
Date:
1. If postgresql always does this I am not sure how it is working in 9.1.
2. I am able to view the content in pgadmin, why would the server not complain when viewing from pgadmin. Not sure what the client encoding would be when using pgadmin.
3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am way off base on this. Is there a way to check the notes column for invalid UTF8 characters and remove them?
4. Based on the error there is invalid data in database so the database should have allowed this invalid data to come in some how right?
Any suggestions as to how to get this working in 9.4? Right now the database is in 9.1 and working fine so may be there is a way to fix these issues before doing a dump for 9.4?
Thanks,
Prasanth
On 8/8/2015 2:40 PM, Prasanth Reddy wrote:
> 1. Is the content vetting new in 9.4?
no, PostgreSQL has always been strict about content types.
> 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in
> notes columns.
the driver isn't doing this, its happening in the postgres
database. you should fix invalid input at the source application
before sending to the database.
> 3. Is there a way to check the content and fix it/remove invalid characters?
define 'invalid character', then fix it. this SQL_ASCII field that
contains 0x92, what character code is that supposed to represent ?
> 4. Is there an option to force the driver to not update database with invalid characters?
the database (NOT the driver) does that already. if you try and insert
invalid data, you get an error and have to rollback the transaction.
--
john r pierce, recycling bits in santa cruz
2. I am able to view the content in pgadmin, why would the server not complain when viewing from pgadmin. Not sure what the client encoding would be when using pgadmin.
3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am way off base on this. Is there a way to check the notes column for invalid UTF8 characters and remove them?
4. Based on the error there is invalid data in database so the database should have allowed this invalid data to come in some how right?
Any suggestions as to how to get this working in 9.4? Right now the database is in 9.1 and working fine so may be there is a way to fix these issues before doing a dump for 9.4?
Thanks,
Prasanth
On 8/8/2015 2:40 PM, Prasanth Reddy wrote:
> 1. Is the content vetting new in 9.4?
no, PostgreSQL has always been strict about content types.
> 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in
> notes columns.
the driver isn't doing this, its happening in the postgres
database. you should fix invalid input at the source application
before sending to the database.
> 3. Is there a way to check the content and fix it/remove invalid characters?
define 'invalid character', then fix it. this SQL_ASCII field that
contains 0x92, what character code is that supposed to represent ?
> 4. Is there an option to force the driver to not update database with invalid characters?
the database (NOT the driver) does that already. if you try and insert
invalid data, you get an error and have to rollback the transaction.
--
john r pierce, recycling bits in santa cruz
On 08/08/2015 04:40 PM, Prasanth Reddy wrote:
-------- Forwarded Message --------
Subject: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92 Date: Sat, 08 Aug 2015 16:39:17 -0500 From: Prasanth Reddy <dbadmin@nqadmin.com> Reply-To: dbadmin@nqadmin.com To: Bosco Rama <postgres@boscorama.com> 1. Is the content vetting new in 9.4? 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in notes columns. 3. Is there a way to check the content and fix it/remove invalid characters? 4. Is there an option to force the driver to not update database with invalid characters? Appreciate your help. Thanks, Prasanth ------------------Original Message-------------------------- On 08/08/15 10:49, Prasanth Reddy wrote: > [snip] > 12:42:02.505 (1) <=BE ParameterStatus(client_encoding = UTF8) [snip] > 12:42:02.506 (1) <=BE ParameterStatus(server_encoding = SQL_ASCII) There's your problem. The client is expecting UTF8 while the server is storing un-vetted SQL_ASCII. You will need to change one of the components' choice of encoding and/or enforce content conformance in the data. Another choice, in cases of known and expected non-compliance, is to use bytea casting to bypass the automated content vetting. But your choice in this case is going to be driven by the ease with which you can alter your application vs the server-side changes that would be needed. HTH, Bosco.
Re: Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
Mark Rotteveel
Date:
On Sat, 08 Aug 2015 17:34:15 -0500, Prasanth Reddy <dbadmin@nqadmin.com> wrote: > 2. I am able to view the content in pgadmin, why would the server not > complain when viewing from pgadmin. Not sure what the client encoding would > be when using pgadmin. pgadmin probably doesn't apply any encoding, but uses the bytes as is. > 3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am > way off base on this. Is there a way to check the notes column for invalid > UTF8 characters and remove them? 0x92 is not a valid ASCII character, ASCII characters are in the range 0x00-0x7F. 0x92 is in the 'extended ascii range', but that is not really defined, andvdepends on the actual encoding (the real character set), that is why the conversion fails. Mark
Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
Prasanth Reddy
Date:
I am able to restore the same dump in 9.3 and it works fine. So seems like 9.4 is much more stringent than all prior versions.
Thanks,
Prasanth
Thanks,
Prasanth
On 08/08/2015 05:34 PM, Prasanth Reddy wrote:
1. If postgresql always does this I am not sure how it is working in 9.1.
2. I am able to view the content in pgadmin, why would the server not complain when viewing from pgadmin. Not sure what the client encoding would be when using pgadmin.
3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am way off base on this. Is there a way to check the notes column for invalid UTF8 characters and remove them?
4. Based on the error there is invalid data in database so the database should have allowed this invalid data to come in some how right?
Any suggestions as to how to get this working in 9.4? Right now the database is in 9.1 and working fine so may be there is a way to fix these issues before doing a dump for 9.4?
Thanks,
Prasanth
On 8/8/2015 2:40 PM, Prasanth Reddy wrote:
> 1. Is the content vetting new in 9.4?
no, PostgreSQL has always been strict about content types.
> 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in
> notes columns.
the driver isn't doing this, its happening in the postgres
database. you should fix invalid input at the source application
before sending to the database.
> 3. Is there a way to check the content and fix it/remove invalid characters?
define 'invalid character', then fix it. this SQL_ASCII field that
contains 0x92, what character code is that supposed to represent ?
> 4. Is there an option to force the driver to not update database with invalid characters?
the database (NOT the driver) does that already. if you try and insert
invalid data, you get an error and have to rollback the transaction.
--
john r pierce, recycling bits in santa cruzOn 08/08/2015 04:40 PM, Prasanth Reddy wrote:
-------- Forwarded Message --------
Subject: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92 Date: Sat, 08 Aug 2015 16:39:17 -0500 From: Prasanth Reddy <dbadmin@nqadmin.com> Reply-To: dbadmin@nqadmin.com To: Bosco Rama <postgres@boscorama.com> 1. Is the content vetting new in 9.4? 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in notes columns. 3. Is there a way to check the content and fix it/remove invalid characters? 4. Is there an option to force the driver to not update database with invalid characters? Appreciate your help. Thanks, Prasanth ------------------Original Message-------------------------- On 08/08/15 10:49, Prasanth Reddy wrote: > [snip] > 12:42:02.505 (1) <=BE ParameterStatus(client_encoding = UTF8) [snip] > 12:42:02.506 (1) <=BE ParameterStatus(server_encoding = SQL_ASCII) There's your problem. The client is expecting UTF8 while the server is storing un-vetted SQL_ASCII. You will need to change one of the components' choice of encoding and/or enforce content conformance in the data. Another choice, in cases of known and expected non-compliance, is to use bytea casting to bypass the automated content vetting. But your choice in this case is going to be driven by the ease with which you can alter your application vs the server-side changes that would be needed. HTH, Bosco.
Re: Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
From
Dave Cramer
Date:
Well your data is still probably wrong. You should fix it
On 9 August 2015 at 14:44, Prasanth Reddy <dbadmin@nqadmin.com> wrote:
I am able to restore the same dump in 9.3 and it works fine. So seems like 9.4 is much more stringent than all prior versions.
Thanks,
PrasanthOn 08/08/2015 05:34 PM, Prasanth Reddy wrote:1. If postgresql always does this I am not sure how it is working in 9.1.
2. I am able to view the content in pgadmin, why would the server not complain when viewing from pgadmin. Not sure what the client encoding would be when using pgadmin.
3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am way off base on this. Is there a way to check the notes column for invalid UTF8 characters and remove them?
4. Based on the error there is invalid data in database so the database should have allowed this invalid data to come in some how right?
Any suggestions as to how to get this working in 9.4? Right now the database is in 9.1 and working fine so may be there is a way to fix these issues before doing a dump for 9.4?
Thanks,
Prasanth
On 8/8/2015 2:40 PM, Prasanth Reddy wrote:
> 1. Is the content vetting new in 9.4?
no, PostgreSQL has always been strict about content types.
> 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in
> notes columns.
the driver isn't doing this, its happening in the postgres
database. you should fix invalid input at the source application
before sending to the database.
> 3. Is there a way to check the content and fix it/remove invalid characters?
define 'invalid character', then fix it. this SQL_ASCII field that
contains 0x92, what character code is that supposed to represent ?
> 4. Is there an option to force the driver to not update database with invalid characters?
the database (NOT the driver) does that already. if you try and insert
invalid data, you get an error and have to rollback the transaction.
--
john r pierce, recycling bits in santa cruzOn 08/08/2015 04:40 PM, Prasanth Reddy wrote:
-------- Forwarded Message --------
Subject: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92 Date: Sat, 08 Aug 2015 16:39:17 -0500 From: Prasanth Reddy <dbadmin@nqadmin.com> Reply-To: dbadmin@nqadmin.com To: Bosco Rama <postgres@boscorama.com> 1. Is the content vetting new in 9.4? 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in notes columns. 3. Is there a way to check the content and fix it/remove invalid characters? 4. Is there an option to force the driver to not update database with invalid characters? Appreciate your help. Thanks, Prasanth ------------------Original Message-------------------------- On 08/08/15 10:49, Prasanth Reddy wrote: > [snip] > 12:42:02.505 (1) <=BE ParameterStatus(client_encoding = UTF8) [snip] > 12:42:02.506 (1) <=BE ParameterStatus(server_encoding = SQL_ASCII) There's your problem. The client is expecting UTF8 while the server is storing un-vetted SQL_ASCII. You will need to change one of the components' choice of encoding and/or enforce content conformance in the data. Another choice, in cases of known and expected non-compliance, is to use bytea casting to bypass the automated content vetting. But your choice in this case is going to be driven by the ease with which you can alter your application vs the server-side changes that would be needed. HTH, Bosco.