Thread: COPY command file name encoding issue (UTF8/WIN1252)
Hi, I have a problem using COPY command with a file name containing non ASCII characters. I use Postgres 9.3.5 x64 on a Windows 7. OS local encoding is WIN1252. My database is encoded in UTF8. I initiate client connection with libpq, connection encoding is set to UTF8. I build properly my file path taking care of encoding. When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY it creates a file named é.bin which is utf8 name interpreted as local8. It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252'). é in UTF8 "\303\251" é in WIN1252"\351" This command works on a database encoded in WIN1252 (same as OS) . So it seems that COPY command don't take care of file name encoding. Is it a bug ? a limitation ? Thanks for your help Mathieu PUJOL
Pujol Mathieu wrote: > I have a problem using COPY command with a file name containing non > ASCII characters. > I use Postgres 9.3.5 x64 on a Windows 7. > OS local encoding is WIN1252. > My database is encoded in UTF8. > I initiate client connection with libpq, connection encoding is set to UTF8. > I build properly my file path taking care of encoding. > > When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY > it creates a file named é.bin which is utf8 name interpreted as local8. > It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252'). > é in UTF8 "\303\251" > é in WIN1252"\351" > > This command works on a database encoded in WIN1252 (same as OS) . > So it seems that COPY command don't take care of file name encoding. > Is it a bug ? a limitation ? > Thanks for your help I didn't look at the code, but I'd say that the database encoding is used for the file name, which is why it works when database encoding and OS locale are the same. I guess that it would be possible for PostgreSQL to figure out with what OS locale the postmaster is running and to convert file names accordingly, but it's probably not trivial since it is OS dependent. Yours, Laurenz Albe
Maybe a new option could be added to let caller specifies the file name encoding, it may know it because he create the source/destination file. I tried to give him a WIN1252 text by doing COPY "test" TO convert_from(convert_to('C:/tmp/é.bin','UTF8'),'WIN1252') WITH BINARY but this call is not allowed. Sending him a text containing escaped WIN1252 hex value fails, because query parser detect invalid UTF8 sequence (which is logical). The problem is that I can't find any way to workaround this bug. Regards Mathieu Pujol Le 23/03/2015 11:46, Albe Laurenz a écrit : > Pujol Mathieu wrote: >> I have a problem using COPY command with a file name containing non >> ASCII characters. >> I use Postgres 9.3.5 x64 on a Windows 7. >> OS local encoding is WIN1252. >> My database is encoded in UTF8. >> I initiate client connection with libpq, connection encoding is set to UTF8. >> I build properly my file path taking care of encoding. >> >> When I run COPY "test" TO 'C:/tmp/é.bin' WITH BINARY >> it creates a file named é.bin which is utf8 name interpreted as local8. >> It could be reproduced convert_from(convert_to('é','UTF8'),'WIN1252'). >> é in UTF8 "\303\251" >> é in WIN1252"\351" >> >> This command works on a database encoded in WIN1252 (same as OS) . >> So it seems that COPY command don't take care of file name encoding. >> Is it a bug ? a limitation ? >> Thanks for your help > I didn't look at the code, but I'd say that the database encoding is > used for the file name, which is why it works when database encoding > and OS locale are the same. > > I guess that it would be possible for PostgreSQL to figure out with what > OS locale the postmaster is running and to convert file names accordingly, > but it's probably not trivial since it is OS dependent. > > Yours, > Laurenz Albe >