Thread: Trouble with COPY IN
I'm having some trouble with a COPY IN BINARY statement. I have the following stack trace: Caused by: java.sql.SQLException: Error writing to database, running statement COPY InterMineObject (OBJECT, class, tableclass, longObjType, fullTime, dateObjType, floatType, longType, booleanType, booleanObjType, title, bigDecimalObjType, stringObjType, doubleObjType, intObjType, name, id, doubleType, shortObjType, rangeStart, intermine_end, seniority, floatObjType, intType, shortType, vatNumber, debt, salary, rangeEnd, address, age, addressId, bankId, companyId, departmentId, businessAddressId, managerId, personalAddressId, parentId, departmentThatRejectedMeId, CEOId) FROM STDIN BINARY at org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:56) at org.intermine.sql.writebatch.Batch$BatchFlusher.run(Batch.java:456) at java.lang.Thread.run(Thread.java:619) Caused by: org.postgresql.util.PSQLException: Tried to end inactive copy at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:803) at org.postgresql.core.v3.CopyInImpl.endCopy(CopyInImpl.java:61) at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:181) at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:161) at org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:51) I have had a brief look at the JDBC driver code, and as far as I can see, this shouldn't ever happen. All I did was call CopyManager.copyIn(). It should have kept the correct locks to complete the operation. For some reason, this seems to exhibit with some data but not with others. Any clues would be very welcome. Matthew -- First law of computing: Anything can go wro sig: Segmentation fault. core dumped.
Could it be that COPY is erroring out, and then the driver tries to end it anyway, causing the error? Anything interesting in the PostgreSQL logs? Is it consistently reproducible with a specific data set? --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 230 Foster City, CA 94404 (650) 242-3500 Main (650) 242-3501 F www.truviso.com On Fri, Jul 16, 2010 at 9:56 AM, Matthew Wakeling <matthew@flymine.org> wrote: > > I'm having some trouble with a COPY IN BINARY statement. I have the > following stack trace: > > Caused by: java.sql.SQLException: Error writing to database, running > statement COPY InterMineObject (OBJECT, class, tableclass, longObjType, > fullTime, dateObjType, floatType, longType, booleanType, booleanObjType, > title, bigDecimalObjType, stringObjType, doubleObjType, intObjType, name, > id, doubleType, shortObjType, rangeStart, intermine_end, seniority, > floatObjType, intType, shortType, vatNumber, debt, salary, rangeEnd, > address, age, addressId, bankId, companyId, departmentId, businessAddressId, > managerId, personalAddressId, parentId, departmentThatRejectedMeId, CEOId) > FROM STDIN BINARY > at > org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:56) > at org.intermine.sql.writebatch.Batch$BatchFlusher.run(Batch.java:456) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.postgresql.util.PSQLException: Tried to end inactive copy > at > org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:803) > at org.postgresql.core.v3.CopyInImpl.endCopy(CopyInImpl.java:61) > at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:181) > at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:161) > at > org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:51) > > I have had a brief look at the JDBC driver code, and as far as I can see, > this shouldn't ever happen. All I did was call CopyManager.copyIn(). It > should have kept the correct locks to complete the operation. > > For some reason, this seems to exhibit with some data but not with others. > > Any clues would be very welcome. > > Matthew > > -- > First law of computing: Anything can go wro > sig: Segmentation fault. core dumped. > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
On Fri, 16 Jul 2010, Maciek Sakrejda wrote: > Could it be that COPY is erroring out, and then the driver tries to > end it anyway, causing the error? That seems likely. What's probably necessary to reproduce this is a failure at the end of the data stream. An early error will be received and processed normally, but an error at the end will be buffered and the driver will think everything succeeded and try to end the copy not knowing that it has already failed. I think we'd need to call the blocking version of processCopyResults prior to endCopy to avoid this misleading error. Kris Jurka
On Fri, 16 Jul 2010, Kris Jurka wrote: > That seems likely. What's probably necessary to reproduce this is a > failure at the end of the data stream. An early error will be received > and processed normally, but an error at the end will be buffered and the > driver will think everything succeeded and try to end the copy not > knowing that it has already failed. I think we'd need to call the > blocking version of processCopyResults prior to endCopy to avoid this > misleading error. > Actually with a little more thinking, what I said was completely bogus. 1) Blocking won't fix anything because in the success path, there's no message to receive, so there's nothing to process. 2) The protocol docs say that after an error, CopyData, CopyDone, and CopyFail messages are ignored, so they wouldn't be causing this error. So some more thought is needed here... Kris Jurka
On Fri, 16 Jul 2010, Maciek Sakrejda wrote: > Could it be that COPY is erroring out, and then the driver tries to > end it anyway, causing the error? Anything interesting in the > PostgreSQL logs? There is nothing interesting in the Postgres logs at all to explain what is going on. I have ERROR statements when I try to drop a table that isn't there, but nothing when this COPY bugs out. > Is it consistently reproducible with a specific data set? >> For some reason, this seems to exhibit with some data but not with >> others. Yes, it seems to break with specific data. Particularly, it works with our COPY tests, but it doesn't work with the higher-level tests. So, the situation is that I have a database writing interface that I wrote about six years ago, and we have been using a patched six-year-old Postgres JDBC driver for ages. I have just updated to the newest JDBC driver (which doesn't need to be patched for COPY), and I am now getting this error. Could the binary format have changed slightly? (I haven't changed my copy of Postgres when upgrading the JDBC driver - it is 8.4.4). Thanks, Matthew -- There are only two kinds of programming languages: those people always bitch about and those nobody uses. (Bjarne Stroustrup)
On Fri, 16 Jul 2010, Kris Jurka wrote: > 2) The protocol docs say that after an error, CopyData, CopyDone, and > CopyFail messages are ignored, so they wouldn't be causing this error. The exception I receive isn't produced by the server. It is caused by the client losing its own lock on the database connection. Surely the place where the lock gets thrown away would be caused by a separate server error, and should have been reported? Matthew -- So, given 'D' is undeclared too, with a default of zero, C++ is equal to D. mnw21, commenting on the "Surely the value of C++ is zero, but C is now 1" response to "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1" response to "C++ -- shouldn't it be called D?"
On Fri, 16 Jul 2010, Maciek Sakrejda wrote: > Could it be that COPY is erroring out, and then the driver tries to > end it anyway, causing the error? Anything interesting in the > PostgreSQL logs? Is it consistently reproducible with a specific data > set? It seems that the COPY command only fails with larger data sets. If I split a binary lump into several parts which are otherwise identical, then it works perfectly, where the complete binary lump doesn't work. I have isolated a reasonably small example that fails. The table definition is table1 (int col1, int col2). The following is an exception that is thrown, containing the full hex dump of the binary COPY, for 583 rows inserted into that table. As far as I can see, there is nothing wrong with the binary format. If I reduce the size to make it send smaller chunks, then the error does not occur. java.sql.SQLException: Error writing to database, running statement COPY table1 (col1, col2) FROM STDIN BINARY, size = 10515, data = 5047434F50590AFF 0D0A000000000000 0000000002000000 04000001A2000000 0400000247000200 000004000001A300 0000040000024600 0200000004000001 A400000004000002 4500020000000400 0001A50000000400 0002440002000000 04000001A6000000 0400000243000200 000004000001A700 0000040000024200 0200000004000001 A800000004000002 4100020000000400 0001A90000000400 0002400002000000 04000001AA000000 040000023F000200 000004000001AB00 0000040000023E00 0200000004000001 AC00000004000002 3D00020000000400 0001AD0000000400 00023C0002000000 04000001AE000000 040000023B000200 000004000001AF00 0000040000023A00 0200000004000001 B000000004000002 3900020000000400 0001B10000000400 0002380002000000 04000001B2000000 0400000237000200 000004000001B300 0000040000023600 0200000004000001 B400000004000002 3500020000000400 0001B50000000400 0002340002000000 04000001B6000000 0400000233000200 000004000001B700 0000040000023200 0200000004000001 B800000004000002 3100020000000400 0001B90000000400 0002300002000000 04000001BA000000 040000022F000200 000004000001BB00 0000040000022E00 0200000004000001 BC00000004000002 2D00020000000400 0001BD0000000400 00022C0002000000 04000001BE000000 040000022B000200 000004000001BF00 0000040000022A00 0200000004000001 C000000004000002 2900020000000400 0001C10000000400 0002280002000000 04000001C2000000 0400000227000200 000004000001C300 0000040000022600 0200000004000001 C400000004000002 2500020000000400 0001C50000000400 0002240002000000 04000001C6000000 0400000223000200 000004000001C700 0000040000022200 0200000004000001 C800000004000002 2100020000000400 0001C90000000400 0002200002000000 04000001CA000000 040000021F000200 000004000001CB00 0000040000021E00 0200000004000001 CC00000004000002 1D00020000000400 0001CD0000000400 00021C0002000000 04000001CE000000 040000021B000200 000004000001CF00 0000040000021A00 0200000004000001 D000000004000002 1900020000000400 0001D10000000400 0002180002000000 04000001D2000000 0400000217000200 000004000001D300 0000040000021600 0200000004000001 D400000004000002 1500020000000400 0001D50000000400 0002140002000000 04000001D6000000 0400000213000200 000004000001D700 0000040000021200 0200000004000001 D800000004000002 1100020000000400 0001D90000000400 0002100002000000 04000001DA000000 040000020F000200 000004000001DB00 0000040000020E00 0200000004000001 DC00000004000002 0D00020000000400 0001DD0000000400 00020C0002000000 04000001DE000000 040000020B000200 000004000001DF00 0000040000020A00 0200000004000001 E000000004000002 0900020000000400 0001E10000000400 0002080002000000 04000001E2000000 0400000207000200 000004000001E300 0000040000020600 0200000004000001 E400000004000002 0500020000000400 0001E50000000400 0002040002000000 04000001E6000000 0400000203000200 000004000001E700 0000040000020200 0200000004000001 E800000004000002 0100020000000400 0001E90000000400 0002000002000000 04000001EA000000 04000001FF000200 000004000001EB00 000004000001FE00 0200000004000001 EC00000004000001 FD00020000000400 0001ED0000000400 0001FC0002000000 04000001EE000000 04000001FB000200 000004000001EF00 000004000001FA00 0200000004000001 F000000004000001 F900020000000400 0001F10000000400 0001F80002000000 04000001F2000000 04000001F7000200 000004000001F300 000004000001F600 0200000004000001 F400000004000001 F500020000000400 0001F50000000400 0001F40002000000 04000001F6000000 04000001F3000200 000004000001F700 000004000001F200 0200000004000001 F800000004000001 F100020000000400 0001F90000000400 0001F00002000000 04000001FA000000 04000001EF000200 000004000001FB00 000004000001EE00 0200000004000001 FC00000004000001 ED00020000000400 0001FD0000000400 0001EC0002000000 04000001FE000000 04000001EB000200 000004000001FF00 000004000001EA00 0200000004000002 0000000004000001 E900020000000400 0002010000000400 0001E80002000000 0400000202000000 04000001E7000200 0000040000020300 000004000001E600 0200000004000002 0400000004000001 E500020000000400 0002050000000400 0001E40002000000 0400000206000000 04000001E3000200 0000040000020700 000004000001E200 0200000004000002 0800000004000001 E100020000000400 0002090000000400 0001E00002000000 040000020A000000 04000001DF000200 0000040000020B00 000004000001DE00 0200000004000002 0C00000004000001 DD00020000000400 00020D0000000400 0001DC0002000000 040000020E000000 04000001DB000200 0000040000020F00 000004000001DA00 0200000004000002 1000000004000001 D900020000000400 0002110000000400 0001D80002000000 0400000212000000 04000001D7000200 0000040000021300 000004000001D600 0200000004000002 1400000004000001 D500020000000400 0002150000000400 0001D40002000000 0400000216000000 04000001D3000200 0000040000021700 000004000001D200 0200000004000002 1800000004000001 D100020000000400 0002190000000400 0001D00002000000 040000021A000000 04000001CF000200 0000040000021B00 000004000001CE00 0200000004000002 1C00000004000001 CD00020000000400 00021D0000000400 0001CC0002000000 040000021E000000 04000001CB000200 0000040000021F00 000004000001CA00 0200000004000002 2000000004000001 C900020000000400 0002210000000400 0001C80002000000 0400000222000000 04000001C7000200 0000040000022300 000004000001C600 0200000004000002 2400000004000001 C500020000000400 0002250000000400 0001C40002000000 0400000226000000 04000001C3000200 0000040000022700 000004000001C200 0200000004000002 2800000004000001 C100020000000400 0002290000000400 0001C00002000000 040000022A000000 04000001BF000200 0000040000022B00 000004000001BE00 0200000004000002 2C00000004000001 BD00020000000400 00022D0000000400 0001BC0002000000 040000022E000000 04000001BB000200 0000040000022F00 000004000001BA00 0200000004000002 3000000004000001 B900020000000400 0002310000000400 0001B80002000000 0400000232000000 04000001B7000200 0000040000023300 000004000001B600 0200000004000002 3400000004000001 B500020000000400 0002350000000400 0001B40002000000 0400000236000000 04000001B3000200 0000040000023700 000004000001B200 0200000004000002 3800000004000001 B100020000000400 0002390000000400 0001B00002000000 040000023A000000 04000001AF000200 0000040000023B00 000004000001AE00 0200000004000002 3C00000004000001 AD00020000000400 00023D0000000400 0001AC0002000000 040000023E000000 04000001AB000200 0000040000023F00 000004000001AA00 0200000004000002 4000000004000001 A900020000000400 0002410000000400 0001A80002000000 0400000242000000 04000001A7000200 0000040000024300 000004000001A600 0200000004000002 4400000004000001 A500020000000400 0002450000000400 0001A40002000000 0400000246000000 04000001A3000200 0000040000024700 000004000001A200 0200000004000002 4800000004000001 A100020000000400 0002490000000400 0001A00002000000 040000024A000000 040000019F000200 0000040000024B00 0000040000019E00 0200000004000002 4C00000004000001 9D00020000000400 00024D0000000400 00019C0002000000 040000024E000000 040000019B000200 0000040000024F00 0000040000019A00 0200000004000002 5000000004000001 9900020000000400 0002510000000400 0001980002000000 0400000252000000 0400000197000200 0000040000025300 0000040000019600 0200000004000002 5400000004000001 9500020000000400 0002550000000400 0001940002000000 0400000256000000 0400000193000200 0000040000025700 0000040000019200 0200000004000002 5800000004000001 9100020000000400 0002590000000400 0001900002000000 040000025A000000 040000018F000200 0000040000025B00 0000040000018E00 0200000004000002 5C00000004000001 8D00020000000400 00025D0000000400 00018C0002000000 040000025E000000 040000018B000200 0000040000025F00 0000040000018A00 0200000004000002 6000000004000001 8900020000000400 0002610000000400 0001880002000000 0400000262000000 0400000187000200 0000040000026300 0000040000018600 0200000004000002 6400000004000001 8500020000000400 0002650000000400 0001840002000000 0400000266000000 0400000183000200 0000040000026700 0000040000018200 0200000004000002 6800000004000001 8100020000000400 0002690000000400 0001800002000000 040000026A000000 040000017F000200 0000040000026B00 0000040000017E00 0200000004000002 6C00000004000001 7D00020000000400 00026D0000000400 00017C0002000000 040000026E000000 040000017B000200 0000040000026F00 0000040000017A00 0200000004000002 7000000004000001 7900020000000400 0002710000000400 0001780002000000 0400000272000000 0400000177000200 0000040000027300 0000040000017600 0200000004000002 7400000004000001 7500020000000400 0002750000000400 0001740002000000 0400000276000000 0400000173000200 0000040000027700 0000040000017200 0200000004000002 7800000004000001 7100020000000400 0002790000000400 0001700002000000 040000027A000000 040000016F000200 0000040000027B00 0000040000016E00 0200000004000002 7C00000004000001 6D00020000000400 00027D0000000400 00016C0002000000 040000027E000000 040000016B000200 0000040000027F00 0000040000016A00 0200000004000002 8000000004000001 6900020000000400 0002810000000400 0001680002000000 0400000282000000 0400000167000200 0000040000028300 0000040000016600 0200000004000002 8400000004000001 6500020000000400 0002850000000400 0001640002000000 0400000286000000 0400000163000200 0000040000028700 0000040000016200 0200000004000002 8800000004000001 6100020000000400 0002890000000400 0001600002000000 040000028A000000 040000015F000200 0000040000028B00 0000040000015E00 0200000004000002 8C00000004000001 5D00020000000400 00028D0000000400 00015C0002000000 040000028E000000 040000015B000200 0000040000028F00 0000040000015A00 0200000004000002 9000000004000001 5900020000000400 0002910000000400 0001580002000000 0400000292000000 0400000157000200 0000040000029300 0000040000015600 0200000004000002 9400000004000001 5500020000000400 0002950000000400 0001540002000000 0400000296000000 0400000153000200 0000040000029700 0000040000015200 0200000004000002 9800000004000001 5100020000000400 0002990000000400 0001500002000000 040000029A000000 040000014F000200 0000040000029B00 0000040000014E00 0200000004000002 9C00000004000001 4D00020000000400 00029D0000000400 00014C0002000000 040000029E000000 040000014B000200 0000040000029F00 0000040000014A00 0200000004000002 A000000004000001 4900020000000400 0002A10000000400 0001480002000000 04000002A2000000 0400000147000200 000004000002A300 0000040000014600 0200000004000002 A400000004000001 4500020000000400 0002A50000000400 0001440002000000 04000002A6000000 0400000143000200 000004000002A700 0000040000014200 0200000004000002 A800000004000001 4100020000000400 0002A90000000400 0001400002000000 04000002AA000000 040000013F000200 000004000002AB00 0000040000013E00 0200000004000002 AC00000004000001 3D00020000000400 0002AD0000000400 00013C0002000000 04000002AE000000 040000013B000200 000004000002AF00 0000040000013A00 0200000004000002 B000000004000001 3900020000000400 0002B10000000400 0001380002000000 04000002B2000000 0400000137000200 000004000002B300 0000040000013600 0200000004000002 B400000004000001 3500020000000400 0002B50000000400 0001340002000000 04000002B6000000 0400000133000200 000004000002B700 0000040000013200 0200000004000002 B800000004000001 3100020000000400 0002B90000000400 0001300002000000 04000002BA000000 040000012F000200 000004000002BB00 0000040000012E00 0200000004000002 BC00000004000001 2D00020000000400 0002BD0000000400 00012C0002000000 04000002BE000000 040000012B000200 000004000002BF00 0000040000012A00 0200000004000002 C000000004000001 2900020000000400 0002C10000000400 0001280002000000 04000002C2000000 0400000127000200 000004000002C300 0000040000012600 0200000004000002 C400000004000001 2500020000000400 0002C50000000400 0001240002000000 04000002C6000000 0400000123000200 000004000002C700 0000040000012200 0200000004000002 C800000004000001 2100020000000400 0002C90000000400 0001200002000000 04000002CA000000 040000011F000200 000004000002CB00 0000040000011E00 0200000004000002 CC00000004000001 1D00020000000400 0002CD0000000400 00011C0002000000 04000002CE000000 040000011B000200 000004000002CF00 0000040000011A00 0200000004000002 D000000004000001 1900020000000400 0002D10000000400 0001180002000000 04000002D2000000 0400000117000200 000004000002D300 0000040000011600 0200000004000002 D400000004000001 1500020000000400 0002D50000000400 0001140002000000 04000002D6000000 0400000113000200 000004000002D700 0000040000011200 0200000004000002 D800000004000001 1100020000000400 0002D90000000400 0001100002000000 04000002DA000000 040000010F000200 000004000002DB00 0000040000010E00 0200000004000002 DC00000004000001 0D00020000000400 0002DD0000000400 00010C0002000000 04000002DE000000 040000010B000200 000004000002DF00 0000040000010A00 0200000004000002 E000000004000001 0900020000000400 0002E10000000400 0001080002000000 04000002E2000000 0400000107000200 000004000002E300 0000040000010600 0200000004000002 E400000004000001 0500020000000400 0002E50000000400 0001040002000000 04000002E6000000 0400000103000200 000004000002E700 0000040000010200 0200000004000002 E800000004000001 0100020000000400 0002E90000000400 0001000002000000 04000002EA000000 04000000FF000200 000004000002EB00 000004000000FE00 0200000004000002 EC00000004000000 FD00020000000400 0002ED0000000400 0000FC0002000000 04000002EE000000 04000000FB000200 000004000002EF00 000004000000FA00 0200000004000002 F000000004000000 F900020000000400 0002F10000000400 0000F80002000000 04000002F2000000 04000000F7000200 000004000002F300 000004000000F600 0200000004000002 F400000004000000 F500020000000400 0002F50000000400 0000F40002000000 04000002F6000000 04000000F3000200 000004000002F700 000004000000F200 0200000004000002 F800000004000000 F100020000000400 0002F90000000400 0000F00002000000 04000002FA000000 04000000EF000200 000004000002FB00 000004000000EE00 0200000004000002 FC00000004000000 ED00020000000400 0002FD0000000400 0000EC0002000000 04000002FE000000 04000000EB000200 000004000002FF00 000004000000EA00 0200000004000003 0000000004000000 E900020000000400 0003010000000400 0000E80002000000 0400000302000000 04000000E7000200 0000040000030300 000004000000E600 0200000004000003 0400000004000000 E500020000000400 0003050000000400 0000E40002000000 0400000306000000 04000000E3000200 0000040000030700 000004000000E200 0200000004000003 0800000004000000 E100020000000400 0003090000000400 0000E00002000000 040000030A000000 04000000DF000200 0000040000030B00 000004000000DE00 0200000004000003 0C00000004000000 DD00020000000400 00030D0000000400 0000DC0002000000 040000030E000000 04000000DB000200 0000040000030F00 000004000000DA00 0200000004000003 1000000004000000 D900020000000400 0003110000000400 0000D80002000000 0400000312000000 04000000D7000200 0000040000031300 000004000000D600 0200000004000003 1400000004000000 D500020000000400 0003150000000400 0000D40002000000 0400000316000000 04000000D3000200 0000040000031700 000004000000D200 0200000004000003 1800000004000000 D100020000000400 0003190000000400 0000D00002000000 040000031A000000 04000000CF000200 0000040000031B00 000004000000CE00 0200000004000003 1C00000004000000 CD00020000000400 00031D0000000400 0000CC0002000000 040000031E000000 04000000CB000200 0000040000031F00 000004000000CA00 0200000004000003 2000000004000000 C900020000000400 0003210000000400 0000C80002000000 0400000322000000 04000000C7000200 0000040000032300 000004000000C600 0200000004000003 2400000004000000 C500020000000400 0003250000000400 0000C40002000000 0400000326000000 04000000C3000200 0000040000032700 000004000000C200 0200000004000003 2800000004000000 C100020000000400 0003290000000400 0000C00002000000 040000032A000000 04000000BF000200 0000040000032B00 000004000000BE00 0200000004000003 2C00000004000000 BD00020000000400 00032D0000000400 0000BC0002000000 040000032E000000 04000000BB000200 0000040000032F00 000004000000BA00 0200000004000003 3000000004000000 B900020000000400 0003310000000400 0000B80002000000 0400000332000000 04000000B7000200 0000040000033300 000004000000B600 0200000004000003 3400000004000000 B500020000000400 0003350000000400 0000B40002000000 0400000336000000 04000000B3000200 0000040000033700 000004000000B200 0200000004000003 3800000004000000 B100020000000400 0003390000000400 0000B00002000000 040000033A000000 04000000AF000200 0000040000033B00 000004000000AE00 0200000004000003 3C00000004000000 AD00020000000400 00033D0000000400 0000AC0002000000 040000033E000000 04000000AB000200 0000040000033F00 000004000000AA00 0200000004000003 4000000004000000 A900020000000400 0003410000000400 0000A80002000000 0400000342000000 04000000A7000200 0000040000034300 000004000000A600 0200000004000003 4400000004000000 A500020000000400 0003450000000400 0000A40002000000 0400000346000000 04000000A3000200 0000040000034700 000004000000A200 0200000004000003 4800000004000000 A100020000000400 0003490000000400 0000A00002000000 040000034A000000 040000009F000200 0000040000034B00 0000040000009E00 0200000004000003 4C00000004000000 9D00020000000400 00034D0000000400 00009C0002000000 040000034E000000 040000009B000200 0000040000034F00 0000040000009A00 0200000004000003 5000000004000000 9900020000000400 0003510000000400 0000980002000000 0400000352000000 0400000097000200 0000040000035300 0000040000009600 0200000004000003 5400000004000000 9500020000000400 0003550000000400 0000940002000000 0400000356000000 0400000093000200 0000040000035700 0000040000009200 0200000004000003 5800000004000000 9100020000000400 0003590000000400 0000900002000000 040000035A000000 040000008F000200 0000040000035B00 0000040000008E00 0200000004000003 5C00000004000000 8D00020000000400 00035D0000000400 00008C0002000000 040000035E000000 040000008B000200 0000040000035F00 0000040000008A00 0200000004000003 6000000004000000 8900020000000400 0003610000000400 0000880002000000 0400000362000000 0400000087000200 0000040000036300 0000040000008600 0200000004000003 6400000004000000 8500020000000400 0003650000000400 0000840002000000 0400000366000000 0400000083000200 0000040000036700 0000040000008200 0200000004000003 6800000004000000 8100020000000400 0003690000000400 0000800002000000 040000036A000000 040000007F000200 0000040000036B00 0000040000007E00 0200000004000003 6C00000004000000 7D00020000000400 00036D0000000400 00007C0002000000 040000036E000000 040000007B000200 0000040000036F00 0000040000007A00 0200000004000003 7000000004000000 7900020000000400 0003710000000400 0000780002000000 0400000372000000 0400000077000200 0000040000037300 0000040000007600 0200000004000003 7400000004000000 7500020000000400 0003750000000400 0000740002000000 0400000376000000 0400000073000200 0000040000037700 0000040000007200 0200000004000003 7800000004000000 7100020000000400 0003790000000400 0000700002000000 040000037A000000 040000006F000200 0000040000037B00 0000040000006E00 0200000004000003 7C00000004000000 6D00020000000400 00037D0000000400 00006C0002000000 040000037E000000 040000006B000200 0000040000037F00 0000040000006A00 0200000004000003 8000000004000000 6900020000000400 0003810000000400 0000680002000000 0400000382000000 0400000067000200 0000040000038300 0000040000006600 0200000004000003 8400000004000000 6500020000000400 0003850000000400 0000640002000000 0400000386000000 0400000063000200 0000040000038700 0000040000006200 0200000004000003 8800000004000000 6100020000000400 0003890000000400 0000600002000000 040000038A000000 040000005F000200 0000040000038B00 0000040000005E00 0200000004000003 8C00000004000000 5D00020000000400 00038D0000000400 00005C0002000000 040000038E000000 040000005B000200 0000040000038F00 0000040000005A00 0200000004000003 9000000004000000 5900020000000400 0003910000000400 0000580002000000 0400000392000000 0400000057000200 0000040000039300 0000040000005600 0200000004000003 9400000004000000 5500020000000400 0003950000000400 0000540002000000 0400000396000000 0400000053000200 0000040000039700 0000040000005200 0200000004000003 9800000004000000 5100020000000400 0003990000000400 0000500002000000 040000039A000000 040000004F000200 0000040000039B00 0000040000004E00 0200000004000003 9C00000004000000 4D00020000000400 00039D0000000400 00004C0002000000 040000039E000000 040000004B000200 0000040000039F00 0000040000004A00 0200000004000003 A000000004000000 4900020000000400 0003A10000000400 0000480002000000 04000003A2000000 0400000047000200 000004000003A300 0000040000004600 0200000004000003 A400000004000000 4500020000000400 0003A50000000400 0000440002000000 04000003A6000000 0400000043000200 000004000003A700 0000040000004200 0200000004000003 A800000004000000 4100020000000400 0003A90000000400 0000400002000000 04000003AA000000 040000003F000200 000004000003AB00 0000040000003E00 0200000004000003 AC00000004000000 3D00020000000400 0003AD0000000400 00003C0002000000 04000003AE000000 040000003B000200 000004000003AF00 0000040000003A00 0200000004000003 B000000004000000 3900020000000400 0003B10000000400 0000380002000000 04000003B2000000 0400000037000200 000004000003B300 0000040000003600 0200000004000003 B400000004000000 3500020000000400 0003B50000000400 0000340002000000 04000003B6000000 0400000033000200 000004000003B700 0000040000003200 0200000004000003 B800000004000000 3100020000000400 0003B90000000400 0000300002000000 04000003BA000000 040000002F000200 000004000003BB00 0000040000002E00 0200000004000003 BC00000004000000 2D00020000000400 0003BD0000000400 00002C0002000000 04000003BE000000 040000002B000200 000004000003BF00 0000040000002A00 0200000004000003 C000000004000000 2900020000000400 0003C10000000400 0000280002000000 04000003C2000000 0400000027000200 000004000003C300 0000040000002600 0200000004000003 C400000004000000 2500020000000400 0003C50000000400 0000240002000000 04000003C6000000 0400000023000200 000004000003C700 0000040000002200 0200000004000003 C800000004000000 2100020000000400 0003C90000000400 0000200002000000 04000003CA000000 040000001F000200 000004000003CB00 0000040000001E00 0200000004000003 CC00000004000000 1D00020000000400 0003CD0000000400 00001C0002000000 04000003CE000000 040000001B000200 000004000003CF00 0000040000001A00 0200000004000003 D000000004000000 1900020000000400 0003D10000000400 0000180002000000 04000003D2000000 0400000017000200 000004000003D300 0000040000001600 0200000004000003 D400000004000000 1500020000000400 0003D50000000400 0000140002000000 04000003D6000000 0400000013000200 000004000003D700 0000040000001200 0200000004000003 D800000004000000 1100020000000400 0003D90000000400 0000100002000000 04000003DA000000 040000000F000200 000004000003DB00 0000040000000E00 0200000004000003 DC00000004000000 0D00020000000400 0003DD0000000400 00000C0002000000 04000003DE000000 040000000B000200 000004000003DF00 0000040000000A00 0200000004000003 E000000004000000 0900020000000400 0003E10000000400 0000080002000000 04000003E2000000 0400000007000200 000004000003E300 0000040000000600 0200000004000003 E400000004000000 0500020000000400 0003E50000000400 0000040002000000 04000003E6000000 0400000003000200 000004000003E700 0000040000000200 0200000004000003 E800000004000000 01FFFF at org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:66) at org.intermine.sql.writebatch.Batch$BatchFlusher.run(Batch.java:458) at java.lang.Thread.run(Thread.java:619) Caused by: org.postgresql.util.PSQLException: Tried to end inactive copy at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:803) at org.postgresql.core.v3.CopyInImpl.endCopy(CopyInImpl.java:61) at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:181) at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:161) at org.intermine.sql.writebatch.FlushJobPostgresCopyImpl.flush(FlushJobPostgresCopyImpl.java:51) -- "We did a risk management review. We concluded that there was no risk of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk>
On Mon, 19 Jul 2010, Matthew Wakeling wrote: > I have isolated a reasonably small example that fails. Just ran the example again, and it passed. Looks like an intermittent problem, not a deterministic one. However, it does seem that the size of the COPY is a significant factor. Matthew -- Here we go - the Fairy Godmother redundancy proof. -- Computer Science Lecturer
I've fixed other COPY-related bugs by running the latest code from revision control rather than the latest released driver. My problem doesn't appear to be the same as your problem, but it is probably worth a test, since it is pretty trivial to build the driver from source. On Mon, Jul 19, 2010 at 7:16 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Mon, 19 Jul 2010, Matthew Wakeling wrote: >> >> I have isolated a reasonably small example that fails. > > Just ran the example again, and it passed. Looks like an intermittent > problem, not a deterministic one. However, it does seem that the size of the > COPY is a significant factor. > > Matthew > > -- > Here we go - the Fairy Godmother redundancy proof. > -- Computer Science Lecturer > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
On Mon, 19 Jul 2010, Samuel Gendler wrote: > I've fixed other COPY-related bugs by running the latest code from > revision control rather than the latest released driver. My problem > doesn't appear to be the same as your problem, but it is probably > worth a test, since it is pretty trivial to build the driver from > source. That does indeed fix the problem. I didn't realise the published jar was so old! I have a few "changed behaviour" problems to fix up before I can give it a completely clean bill of health. I'll get back to the list if there are any further problems. Matthew -- Note: some countries impose serious penalties for a conspiracy to overthrow the political system. THIS DOES NOT FIX THE VULNERABILITY. -- http://seclists.org/vulnwatch/2003/q2/0002.html
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > That does indeed fix the problem. I didn't realise the published jar was so > old! Okay, no it doesn't fix the problem after all. Identical stack trace to before. Any ideas? Matthew -- for a in past present future; do for b in clients employers associates relatives neighbours pets; do echo "The opinions here in no way reflect the opinions of my $a $b." done; done
I'd be willing to take a more in-depth look if you can minimize this to a small, at-least-semi-reproducible test case. Ideally, something that can be included in http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresql/test/jdbc2/CopyTest.java (since I noticed the driver doesn't have any WITH BINARY tests). If that's more trouble to you than it's worth, I'd try turning on driver-level logging and see if that gets you anything, or look at the traffic on the wire with something like wireshark to see if you're seeing the network traffic you expect (from the sound of it, I don't expect the problem to be there, but it's always nice to eliminate things). A protocol reference is available ( http://developer.postgresql.org/pgdocs/postgres/protocol-flow.html#PROTOCOL-COPY ), but you're basically looking for the pattern CopyIn CopyData* (CopyDone|CopyFail). If for some reason you see other messages interspersed from the client or a repeated Copy{Done,Fail} from the client, that could be a bad sign (although again, your server logs suggest this isn't the case). --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 230 Foster City, CA 94404 (650) 242-3500 Main (650) 242-3501 F www.truviso.com On Tue, Jul 20, 2010 at 3:29 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Tue, 20 Jul 2010, Matthew Wakeling wrote: >> >> That does indeed fix the problem. I didn't realise the published jar was >> so old! > > Okay, no it doesn't fix the problem after all. Identical stack trace to > before. Any ideas? > > Matthew > > -- > for a in past present future; do > for b in clients employers associates relatives neighbours pets; do > echo "The opinions here in no way reflect the opinions of my $a $b." > done; done > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
On Tue, 20 Jul 2010, Maciek Sakrejda wrote: > I'd be willing to take a more in-depth look if you can minimize this > to a small, at-least-semi-reproducible test case. Ideally, something > that can be included in > http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresql/test/jdbc2/CopyTest.java > (since I noticed the driver doesn't have any WITH BINARY tests). Okay, I'm looking at this now. Just tried running the unit tests (without any modifications), and I get a single failure in date handling: runtest: [junit] Testsuite: org.postgresql.test.jdbc2.Jdbc2TestSuite [junit] Tests run: 288, Failures: 0, Errors: 1, Time elapsed: 19.517 sec [junit] [junit] ------------- Standard Error ----------------- [junit] DatabaseEncodingTest: Skipping UNICODE database tests as test database encoding is SQL_ASCII [junit] ------------- ---------------- --------------- [junit] Testcase: testSetDate(org.postgresql.test.jdbc2.DateTest): Caused an ERROR [junit] null [junit] java.lang.IllegalArgumentException [junit] at java.sql.Date.valueOf(Date.java:138) [junit] at org.postgresql.test.jdbc2.DateTest.testSetDate(DateTest.java:120) [junit] [junit] [junit] Test org.postgresql.test.jdbc2.Jdbc2TestSuite FAILED Will try and produce a test for binary copy in now and let you know how it goes. Matthew -- Most books now say our sun is a star. But it still knows how to change back into a sun in the daytime.
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > On Tue, 20 Jul 2010, Maciek Sakrejda wrote: >> I'd be willing to take a more in-depth look if you can minimize this >> to a small, at-least-semi-reproducible test case. Ideally, something >> that can be included in >> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresql/test/jdbc2/CopyTest.java >> (since I noticed the driver doesn't have any WITH BINARY tests). > > Okay, I'm looking at this now. Just written a test that writes 100000 rows into the test table using binary copy, and it works perfectly. Not sure what's going on here, as it still fails in my application. One difference is that the test uses a straight connection, and my application uses a ds.PGPooledConnection. Could that be affecting it at all? Matthew -- It's one of those irregular verbs - "I have an independent mind," "You are an eccentric," "He is round the twist." -- Bernard Woolly, Yes Prime Minister
Matthew Wakeling <matthew@flymine.org> wrote: > Just written a test that writes 100000 rows into the test table > using binary copy, and it works perfectly. Not sure what's going > on here, as it still fails in my application. Random thought: is SSL involved in one and not the other? -Kevin
On Tue, 20 Jul 2010, Kevin Grittner wrote: > Matthew Wakeling <matthew@flymine.org> wrote: > >> Just written a test that writes 100000 rows into the test table >> using binary copy, and it works perfectly. Not sure what's going >> on here, as it still fails in my application. > > Random thought: is SSL involved in one and not the other? No. After a little more investigation, it appears that core.v3.QueryExecutorImpl.unlock() is being called before core.v3.QueryExecutorImpl.endCopy(), in core.v3.QueryExecutorImpl.processCopyResults() apparently because it receives a "Z", called by core.v3.QueryExecutorImpl.writeToCopy(). Investigating further. Matthew -- I quite understand I'm doing algebra on the blackboard and the usual response is to throw objects... If you're going to freak out... wait until party time and invite me along -- Computer Science Lecturer
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > After a little more investigation, it appears that > core.v3.QueryExecutorImpl.unlock() is being called before > core.v3.QueryExecutorImpl.endCopy(), in > core.v3.QueryExecutorImpl.processCopyResults() apparently because it receives > a "Z", called by core.v3.QueryExecutorImpl.writeToCopy(). Investigating > further. Okay, have investigated a little further. It seems that the server may be responding to the client a little too quickly with the "all done" message. Here is a sequence of events in core.v3.QueryExecutorImpl: We call startCopy(). It calls processCopyResults(). In processCopyResults, we receive character G, and we lock. We call writeToCopy(). It calls processCopyResults(). We receive character C. We receive character Z, and unlock. We call endCopy(), which tries to unlock and fails. So, should the unlocking happen elsewhere, or should endCopy not require a lock? Matthew -- Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business. Quark: That's the 35th. Jadzia: Oh yes, that's right. What's the 34th again? Quark: War is good for business. It's easy to get them mixed up.
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > So, should the unlocking happen elsewhere, or should endCopy not require a > lock? Or a third option. We should delay the processing of the "Z" (Ready for query) message until after the endCopy has been called. I have still not been able to create a test that provokes this bug. I do not understand under what conditions the server will do this, but sometimes it will respond with a "C" and a "Z" before the endCopy message has actually been sent to it. Maybe it is keying off the "-1" (no further rows) part of the binary copy. In any case, I am including a patch here which seems to fix the driver for my application. When the driver receives a "C" message (command finished), and block is set to false, then we can assume that a little later on we will call processCopyResults again very soon with block set to true. Instead of setting block to true, finish processing of messages in order to delay seeing the "Z" message until after the endCopy method has been called. The patch also contains a simple test for binary copy in, although it does not exercise this bug. Index: org/postgresql/core/v3/QueryExecutorImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java,v retrieving revision 1.48 diff -u -r1.48 QueryExecutorImpl.java --- org/postgresql/core/v3/QueryExecutorImpl.java 21 Mar 2010 07:13:37 -0000 1.48 +++ org/postgresql/core/v3/QueryExecutorImpl.java 20 Jul 2010 14:48:03 -0000 @@ -931,7 +931,9 @@ error = se; } - block = true; + if (!block) { + endReceiving = true; + } break; case 'E': // ErrorMessage (expected response to CopyFail) Index: org/postgresql/test/jdbc2/CopyTest.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/test/jdbc2/CopyTest.java,v retrieving revision 1.4 diff -u -r1.4 CopyTest.java --- org/postgresql/test/jdbc2/CopyTest.java 8 Apr 2010 16:18:05 -0000 1.4 +++ org/postgresql/test/jdbc2/CopyTest.java 20 Jul 2010 14:48:03 -0000 @@ -15,6 +15,7 @@ import java.sql.Statement; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; +import java.io.DataOutputStream; import java.io.InputStream; import java.io.OutputStream; import java.io.PrintStream; @@ -272,4 +273,35 @@ assertEquals(0, getCount()); } + public void testBinaryCopyIn() throws Exception { + con.setAutoCommit(true); + String sql = "COPY copytest FROM STDIN BINARY"; + ByteArrayOutputStream data = new ByteArrayOutputStream(); + DataOutputStream dos = new DataOutputStream(data); + dos.writeBytes("PGCOPY\n"); + dos.writeByte(255); + dos.writeBytes("\r\n"); + dos.writeByte(0); // Signature done + dos.writeInt(0); // Flags - we aren't supplying OIDS + dos.writeInt(0); // Length of header extension + for (int row = 0; row < 99700; row++) { + dos.writeShort(3); // Three rows + dos.writeInt(5); // String length + dos.writeByte('H'); + dos.writeByte('e'); + dos.writeByte('l'); + dos.writeByte('l'); + dos.writeByte('o'); + dos.writeInt(4); // int length + dos.writeInt(row); // Second argument + dos.writeInt(-1); // Null third argument + } + dos.writeShort(-1); // End of rows + dos.flush(); + dos.close(); + byte[] byteArray = data.toByteArray(); + //System.err.println("Data length: " + byteArray.length); + long count = copyAPI.copyIn(sql, new ByteArrayInputStream(byteArray)); + assertEquals(99700, count); + } } Hope this helps, Matthew -- There once was a limerick .sig that really was not very big It was going quite fine Till it reached the fourth line
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > Okay, have investigated a little further. It seems that the server may be > responding to the client a little too quickly with the "all done" message. > Here is a sequence of events in core.v3.QueryExecutorImpl: > > We call startCopy(). It calls processCopyResults(). > In processCopyResults, we receive character G, and we lock. > > We call writeToCopy(). It calls processCopyResults(). > We receive character C. > We receive character Z, and unlock. > > We call endCopy(), which tries to unlock and fails. > > > So, should the unlocking happen elsewhere, or should endCopy not require a > lock? To me it seems like a protocol violation to receive CommandComplete(C) and ReadyForQuery(Z) before we've issued CopyDone(c) and this should be fixed server side. Kris Jurka
On Tue, 20 Jul 2010, Kris Jurka wrote: > To me it seems like a protocol violation to receive CommandComplete(C) and > ReadyForQuery(Z) before we've issued CopyDone(c) and this should be fixed > server side. Is anyone following this up? Matthew -- If you're thinking "Oh no, this lecturer thinks Turing Machines are a feasible method of computation, where's the door?", then you are in luck. There are some there, there, and by the side there. Oxygen masks will not drop from the ceiling... -- Computer Science Lecturer
On Thu, 22 Jul 2010, Maciek Sakrejda wrote: > Can you confirm Kris's theory? You should be able to do that by > starting a binary copy with the CopyManager.copyIn(String), writing > all your binary data to it, flushing, and then sleeping for a while > before calling endCopy? If Kris is right, you should get the error > after the flush before the endCopy. If that's the case, I can take a > look at doing a patch this weekend. If I'm reading the protocol > contract right, it ultimately should be fixed in the backend, though. I can try and reproduce that. However, I will point out that the patch I sent to the list which delays the reading of the Z does fix the problem. Matthew -- Ozzy: Life is full of disappointments. Millie: No it isn't - I can always fit more in.
Can you confirm Kris's theory? You should be able to do that by starting a binary copy with the CopyManager.copyIn(String), writing all your binary data to it, flushing, and then sleeping for a while before calling endCopy? If Kris is right, you should get the error after the flush before the endCopy. If that's the case, I can take a look at doing a patch this weekend. If I'm reading the protocol contract right, it ultimately should be fixed in the backend, though. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 230 Foster City, CA 94404 (650) 242-3500 Main (650) 242-3501 F www.truviso.com On Thu, Jul 22, 2010 at 9:38 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Tue, 20 Jul 2010, Kris Jurka wrote: >> >> To me it seems like a protocol violation to receive CommandComplete(C) and >> ReadyForQuery(Z) before we've issued CopyDone(c) and this should be fixed >> server side. > > Is anyone following this up? > > Matthew > > -- > If you're thinking "Oh no, this lecturer thinks Turing Machines are a > feasible > method of computation, where's the door?", then you are in luck. There are > some there, there, and by the side there. Oxygen masks will not drop from > the > ceiling... -- Computer Science Lecturer >
> I can try and reproduce that. However, I will point out that the patch I > sent to the list which delays the reading of the Z does fix the problem. Sorry, I missed that. I'm not crazy about the logic (if I'm following it correctly), but it seems relatively sound (and has the benefit of being minimally invasive). --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 230 Foster City, CA 94404 (650) 242-3500 Main (650) 242-3501 F www.truviso.com
On Thu, 22 Jul 2010, Maciek Sakrejda wrote: >> I can try and reproduce that. However, I will point out that the patch I >> sent to the list which delays the reading of the Z does fix the problem. > > Sorry, I missed that. I'm not crazy about the logic (if I'm following > it correctly), but it seems relatively sound (and has the benefit of > being minimally invasive). No, the logic sucks, as it is full of possible failure modes. What if you for some reason call processCopyResults() again, after deciding to delay after seeing the C, but before calling endCopy()? It'd see the Z and fail. Anyway, I have just tried to get it to fail using the method you suggested, and failed. For some reason, it seems that the server only responds early under some circumstances that I can't work out. The test case I have here doesn't do it. It does seem to be data dependent. For reference, I edited CopyManager, and put a ten second sleep just before endCopy(). I edited QueryExecutorImpl and put a flush after each pgStream.send(data), and I put a processCopyResults() right at the beginning of endCopy. This should provoke an error if the server does the wrong thing. I'll try using this in my application, and see what happens. Will get back to you. Matthew -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Kernighan
On Thu, 22 Jul 2010, Matthew Wakeling wrote: > On Tue, 20 Jul 2010, Kris Jurka wrote: >> To me it seems like a protocol violation to receive CommandComplete(C) and >> ReadyForQuery(Z) before we've issued CopyDone(c) and this should be fixed >> server side. > > Is anyone following this up? Yes, I am able to reproduce this at will now. I will look at what's going on server side after some lunch. Kris Jurka
Per discussion and investigation on the -jdbc list, the server appears to violate the frontend/backend protocol when binary copy data is sent to the server. Upon receiving the binary copy end of data marker (a -1 field count), the server immediately responds with CommandComplete and ReadyForQuery without waiting for the frontend to issue CopyDone or CopyFail. This confuses the JDBC driver as it doesn't think the command sequence should have finished yet. Attached is a patch to make the server continue to consume protocol data until instructed to stop by the client in the same way as copying text data to the server currently works. http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-COPY http://www.postgresql.org/docs/8.4/static/sql-copy.html Kris Jurka
Attachment
On Thu, Jul 22, 2010 at 5:34 PM, Kris Jurka <books@ejurka.com> wrote: > Per discussion and investigation on the -jdbc list, the server appears to > violate the frontend/backend protocol when binary copy data is sent to the > server. Upon receiving the binary copy end of data marker (a -1 field > count), the server immediately responds with CommandComplete and > ReadyForQuery without waiting for the frontend to issue CopyDone or > CopyFail. This confuses the JDBC driver as it doesn't think the command > sequence should have finished yet. > > Attached is a patch to make the server continue to consume protocol data > until instructed to stop by the client in the same way as copying text data > to the server currently works. > > http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-COPY > http://www.postgresql.org/docs/8.4/static/sql-copy.html > > Kris Jurka I guess the obvious question is whether we shouldn't instead change the docs to match the behavior. I suspect there's almost no chance we'd consider back-patching a change of this type, since it is a clear behavior change. And even if we did, there would still be people running servers with the old behavior with which JDBC and other drivers would have to cope. Having two different behaviors might be worse than the status quo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, 22 Jul 2010, Kris Jurka wrote: > Yes, I am able to reproduce this at will now. I will look at what's going on > server side after some lunch. Well, at least someone can reproduce it. I have completely failed so far, even going so far as to perform exactly the same operation in the test as I do in my application. It fails in the application, but passes in the test. Matthew -- For those of you who are into writing programs that are as obscure and complicated as possible, there are opportunities for... real fun here -- Computer Science Lecturer
On Thu, 22 Jul 2010, Robert Haas wrote: > On Thu, Jul 22, 2010 at 5:34 PM, Kris Jurka <books@ejurka.com> wrote: >> >> Attached is a patch to make the server continue to consume protocol data >> until instructed to stop by the client in the same way as copying text data >> to the server currently works. >> > > I guess the obvious question is whether we shouldn't instead change > the docs to match the behavior. I suspect there's almost no chance > we'd consider back-patching a change of this type, since it is a clear > behavior change. And even if we did, there would still be people > running servers with the old behavior with which JDBC and other > drivers would have to cope. Having two different behaviors might be > worse than the status quo. > It is a clear behavior change, but that's what bug fixes are. I would advocate back-patching this because I doubt many people would be affected by this change and I think it would be awkward trying to document how things work differently in binary mode when sending a file end marker than in text mode or without a file end marker. If this was fixed server side and backpatched, I would not modify the JDBC driver to work with older server versions. The copy documentation is clear that you must call PQputCopyEnd or equivalent to end the copy sequence, so this would only affect people who are not doing that and using binary copy mode. I doubt many people are using binary copy at all because of the additional difficulty in generating binary format data and the potential for portability problems. Kris Jurka
Kris Jurka wrote: >> >> I guess the obvious question is whether we shouldn't instead change >> the docs to match the behavior. I suspect there's almost no chance >> we'd consider back-patching a change of this type, since it is a clear >> behavior change. And even if we did, there would still be people >> running servers with the old behavior with which JDBC and other >> drivers would have to cope. Having two different behaviors might be >> worse than the status quo. >> > > It is a clear behavior change, but that's what bug fixes are. That was my first reaction. I don't think we're in the business if redefining bugs out of existence. cheers andrew
Kris Jurka <books@ejurka.com> writes: > Attached is a patch to make the server continue to consume protocol data > until instructed to stop by the client in the same way as copying text > data to the server currently works. I believe this is a misunderstanding of the protocol spec. The spec is (intended to say that) we'll continue to accept data after reporting an error, not that we will silently swallow an incorrect data stream and not complain about it. Which is what this patch will do. regards, tom lane
On Fri, Jul 23, 2010 at 9:32 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > Kris Jurka wrote: >>> >>> I guess the obvious question is whether we shouldn't instead change >>> the docs to match the behavior. I suspect there's almost no chance >>> we'd consider back-patching a change of this type, since it is a clear >>> behavior change. And even if we did, there would still be people >>> running servers with the old behavior with which JDBC and other >>> drivers would have to cope. Having two different behaviors might be >>> worse than the status quo. >>> >> >> It is a clear behavior change, but that's what bug fixes are. > > That was my first reaction. I don't think we're in the business if > redefining bugs out of existence. I certainly understand that reaction - I just worry that there might be people depending on the current behavior. We really don't want to get a reputation for breaking things in minor releases. But this is not an area of the code I'm very familiar with, and I'm not in a good position to judge the likelihood of breakage, so I'll defer to those who are... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 7/23/2010 6:40 AM, Tom Lane wrote: > Kris Jurka<books@ejurka.com> writes: >> Attached is a patch to make the server continue to consume protocol data >> until instructed to stop by the client in the same way as copying text >> data to the server currently works. > > I believe this is a misunderstanding of the protocol spec. The spec is > (intended to say that) we'll continue to accept data after reporting an > error, not that we will silently swallow an incorrect data stream and > not complain about it. Which is what this patch will do. > All this does is make binary mode match text mode. Are you planning on changing text mode to match binary mode instead? Currently text mode parsing ends at the data end marker (\.) and throws away any further data which may or may not be ill formatted. For example there's no complaint about copying the following data file into a single column integer table even though there is bogus data after the file end marker 3 4 \. asdf aff 5 qq Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On 7/23/2010 6:40 AM, Tom Lane wrote: >> I believe this is a misunderstanding of the protocol spec. The spec is >> (intended to say that) we'll continue to accept data after reporting an >> error, not that we will silently swallow an incorrect data stream and >> not complain about it. Which is what this patch will do. > All this does is make binary mode match text mode. The fact that text mode eats data after \. is a backwards-compatibility kluge to match the behavior of pre-7.4 COPY. It could very legitimately be argued to be a bug in itself. I don't think that we should make binary mode match it. The main concrete reason why not is that binary mode has almost no redundancy. It would be really easy for the code change you suggest to result in data being silently discarded with no hint of what went wrong. After some reflection, I think the real issue here is that the JDBC driver is depending on a behavior not stated in the protocol, which is the relative timing of FE-to-BE and BE-to-FE messages. Once you've sent the EOF marker, the only correct follow-on for a spec-compliant frontend is a CopyEnd message. So the backend is just sending its response a bit sooner. There's nothing in the protocol spec forbidding that. I would be willing to accept a patch that avoided sending CopyEnd immediately after receiving EOF, so long as it still threw an error for extra data; but this is not that patch. The larger issue though is whether it wouldn't be better to change the driver behavior instead. I can't help thinking that the JDBC driver must be being overly cute if this breaks it ... and you're never going to get everybody to upgrade their server version, even if we were willing to back-patch the change. regards, tom lane
On Fri, 23 Jul 2010, Tom Lane wrote: > Kris Jurka <books@ejurka.com> writes: >> On 7/23/2010 6:40 AM, Tom Lane wrote: >>> I believe this is a misunderstanding of the protocol spec. The spec is >>> (intended to say that) we'll continue to accept data after reporting an >>> error, not that we will silently swallow an incorrect data stream and >>> not complain about it. Which is what this patch will do. > >> All this does is make binary mode match text mode. > > The fact that text mode eats data after \. is a backwards-compatibility > kluge to match the behavior of pre-7.4 COPY. It could very legitimately > be argued to be a bug in itself. I don't think that we should make > binary mode match it. The main concrete reason why not is that binary > mode has almost no redundancy. It would be really easy for the code > change you suggest to result in data being silently discarded with no > hint of what went wrong. Binary copy mode already does this (eat data silently after -1 field count). The patch I sent just made it follow the fe/be protocol while it does so. jurka=# create table copytest (a int); CREATE TABLE jurka=# insert into copytest values (1); INSERT 0 1 jurka=# \copy copytest to copydata with binary jurka=# \! echo garbage >> copydata jurka=# \copy copytest from copydata with binary jurka=# select * from copytest; a --- 1 1 (2 rows) > After some reflection, I think the real issue here is that the JDBC > driver is depending on a behavior not stated in the protocol, which > is the relative timing of FE-to-BE and BE-to-FE messages. Once you've > sent the EOF marker, the only correct follow-on for a spec-compliant > frontend is a CopyEnd message. So the backend is just sending its > response a bit sooner. There's nothing in the protocol spec forbidding > that. What about CopyFail? The protocol docs say nothing about the message contents only about the messages themselves. Kris Jurka
On Jul 23, 2010, at 7:11 AM, Tom Lane wrote: > I can't help thinking that the JDBC driver must be being overly cute > if this breaks it ... I was wondering the same thing when I first saw Kris' message. However, iff I understand what JDBC is trying to achieve,I don't think I would call it "overly". @Kris Is this a problem because JDBC is trying to detect failures as early as possible during a COPY IN? Or, is it just JDBC's normal MO to always be reading? Well, I've wanted to do the former (early error detection) with py-postgresql's COPY support, and I imagine getting a readevent marking completion prior to emitting done/fail could be a snag. cheers, jwp
On Sat, 24 Jul 2010, James William Pye wrote: > On Jul 23, 2010, at 7:11 AM, Tom Lane wrote: >> I can't help thinking that the JDBC driver must be being overly cute >> if this breaks it ... > > I was wondering the same thing when I first saw Kris' message. However, > iff I understand what JDBC is trying to achieve, I don't think I would > call it "overly". > > Is this a problem because JDBC is trying to detect failures as early as > possible during a COPY IN? Or, is it just JDBC's normal MO to always be > reading? The JDBC driver reads server messages for multiple reasons. One of them is indeed to do early failure detection. Another is to pickup NoticeResponse messages to avoid a network buffer deadlock. If someone puts a trigger on the table you're copying data into that does RAISE NOTICE 'received row X' for each row, to avoid a full network buffer deadlock, the client must regularly read from the backend. So as we are reading along, supposing that we're still mid-copy, we get a command complete message. So this is possible to work around driver side by peeking into the network stream and delaying processing of the end of copy until the driver agrees that the copy is done, but I still maintain that this is a server bug. It is not OK for the server to assume that the client is done and move on, the client must tell the server what it wants done. Kris Jurka
On Jul 25, 2010, at 8:01 AM, Kris Jurka wrote: > The JDBC driver reads server messages for multiple reasons. > One of them is indeed to do early failure detection. That's high quality. =) > Another is to pickup NoticeResponse messages to avoid a network buffer deadlock. That's a good catch. I don't think psql/restore would often run into this as when COPY IN is in play, it's normally restoringa database. However, with JDBC, I imagine COPY would more often be used to do bulk loading into live tables thatmay very well cause a NOTICE. [Well, I reference psql/libpq because I don't recall it recognizing failure during COPYIN in the past, so I assume it's not receiving any data in that state.] hrm, I suppose a lazy way around that problem would be to suspend all client messages(client_min_messages) during COPY IN.Tho, I guess one would still have to contend with NotificationResponse, and ParameterStatus.. > So this is possible to work around driver side by peeking into the network stream and delaying processing of the end ofcopy until the driver agrees that the copy is done, but I don't think you would have to peek in. If the interface were to always hold onto the last message or last n-bytes submittedto be sent, it would be able to send the possible CopyData(EOF) and CopyDone once the COPY operation (at the interfacelevel) is closed/shutdown/terminated. Granted, this is dependent on CopyData(EOF) not being in the middle of regularCopyData, but I gather that that would end in an ErrorResponse anyways. > I still maintain that this is a server bug. It is not OK for the server to assume that the client is done and move on,the client must tell the server what it wants done. I'm a bit torn here. While it would seem to be either a bug in the spec or a bug in the server, I'm inclined to call it awart in the server's implementation of the spec. I don't see the fix as being dangerous, but I imagine an implementor would want to have the workaround in place regardless.I certainly would. I'd be in favor of seeing this fixed in 9.x, and the documentation updated to warn implementors about the wart in the olderversions.. That is, I don't see any reason why we can't get rid of this unsightly thing considering the workaroundswould still work with a wart-free server. cheers, jwp
On Wed, 28 Jul 2010, James William Pye wrote: > > hrm, I suppose a lazy way around that problem would be to suspend all > client messages(client_min_messages) during COPY IN. Tho, I guess one > would still have to contend with NotificationResponse, and > ParameterStatus.. Technically you won't get NotificationResponse until transaction end, so you don't need to worry about that mid copy. > I don't think you would have to peek in. If the interface were to always > hold onto the last message or last n-bytes submitted to be sent, it > would be able to send the possible CopyData(EOF) and CopyDone once the > COPY operation (at the interface level) is closed/shutdown/terminated. > Granted, this is dependent on CopyData(EOF) not being in the middle of > regular CopyData, but I gather that that would end in an ErrorResponse > anyways. One of the key points of confusion is that CopyData(EOF) does not result in an error. It results in ignoring any futher data. The problem I have is that for text mode it waits for CopyDone, but in binary mode it ends the copy sequence immediately. Additionally the interface exposed by the JDBC driver lets the user write arbitrary CopyData bytes to the server, so without parsing all of that we don't know whether they've issued CopyData(EOF) or not. Kris Jurka
On Jul 28, 2010, at 9:53 AM, Kris Jurka wrote: > Technically you won't get NotificationResponse until transaction end, so you don't need to worry about that mid copy. Ah, thanks for noting that. It would appear my original reading of the async section didn't get far enough beyond "Frontendsmust be prepared to deal with these messages at any time, even when not engaged in a query.". I see the note belowclarifying NotificationResponse. > One of the key points of confusion is that CopyData(EOF) does not result in an error. > It results in ignoring any futher data. > The problem I have is that for text mode it waits for CopyDone, but in binary mode it ends the copy sequence immediately. That is bothersome. :\ > Additionally the interface exposed by the JDBC driver lets the user write arbitrary CopyData bytes to the server, so withoutparsing all of that we don't know whether they've issued CopyData(EOF) or not. Okay, so you can't know with absolute certainty without parsing the data, but the usual case would be handled by holdingonto the last-N bytes or so. Enough to fit the EOF and perhaps a little more for paranoia's sake. That's not to say that I'm missing the problem. When (not "if", "when") the user feeds data past a CopyData(EOF), it's goingto get interesting. [Thinking about the logic necessary to handle such a case and avoid network buffer deadlock...] I would think the least invasive way to handle it would be to set the CommandComplete and ReadyForQuery messages aside whenthey are received if CopyDone hasn't been sent, continue the COPY operation as usual until it is shutdown, send CopyDoneand, finally, "reinstate" CommandComplete and RFQ as if they were just received.. I don't think that really accommodatesfor CopyFail as the status in RFQ will need to be "adjusted" to match what was actually done..? Well, I'm notsure you would need to worry about NoticeResponse after a premature CommandComplete as INSERTs are no longer happening.ugh. +1 for a fix. Not directly regarding your patch, but while the discussion is in the general area. I think it would be wise to throw an error when non-empty CopyData messages are received after CopyData(EOF). Chances arethat the user is making a mistake and should be notified of it. cheers, jwp
(Yes, I know I'm not on the hackers list. Most interested parties should get this directly anyway.) >> Additionally the interface exposed by the JDBC driver lets the user >> write arbitrary CopyData bytes to the server, so without parsing all of >> that we don't know whether they've issued CopyData(EOF) or not. > > Okay, so you can't know with absolute certainty without parsing the > data, but the usual case would be handled by holding onto the last-N > bytes or so. Enough to fit the EOF and perhaps a little more for > paranoia's sake. > > That's not to say that I'm missing the problem. When (not "if", "when") > the user feeds data past a CopyData(EOF), it's going to get interesting. This is the reason why the patch to the JDBC driver that I sent in is very fragile. In the case where a user provides a binary copy with lots of data after the EOF, the processCopyData method *will* get called after the CommandComplete and ReadyForQuery messages have been received, even if we try to delay processing of the ReadyForQuery message. > [Thinking about the logic necessary to handle such a case and avoid > network buffer deadlock...] I would think the least invasive way to > handle it would be to set the CommandComplete and ReadyForQuery messages > aside when they are received if CopyDone hasn't been sent, continue the > COPY operation as usual until it is shutdown, send CopyDone and, > finally, "reinstate" CommandComplete and RFQ as if they were just > received.. Basically, yes. We need to introduce a little more state into the JDBC driver. Currently, the driver is in one of two states: 1. In the middle of a copy. 2. Not in a copy. These states are recorded in the lock system. We need to introduce a new state, where the copy is still locked, but we know that the CommandComplete and ReadyForQuery messages have been received. We can no longer unlock the copy in processCopyData - we need to do that in endCopy instead, after calling processCopyData to ensure that we wait for a valid CommandComplete and ReadyForQuery message first. Matthew -- Terrorists evolve but security is intelligently designed? -- Jake von Slatt
On Wed, 28 Jul 2010, James William Pye wrote: > Not directly regarding your patch, but while the discussion is in the > general area. I think it would be wise to throw an error when non-empty > CopyData messages are received after CopyData(EOF). Chances are that the > user is making a mistake and should be notified of it. > As this is also the direction that Tom Lane indicated we should go, here is a patch which errors out after receiving any more copy data past the EOF marker. This also fixes the protocol problem I previously brought up because the act of checking to see if there is any more data does ensure that if there isn't any more data in the current buffer, that we wait for the client to provide CopyDone/Fail. Kris Jurka
Attachment
On Aug 6, 2010, at 4:31 PM, Kris Jurka wrote: > <binary-copy-end-v2.patch> I think there's a snag in the patch: postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY; ERROR: row field count is -1, expected 1 CONTEXT: COPY data, line 4 Probably a quick/small fix away, I imagine. But, I was able to trigger the new ERROR with py-postgresql: >>> import postgresql as pg >>> db=pg.open('localhost/postgres') >>> q=db.prepare('copy data FROM STDIN WITH BINARY') >>> from itertools import chain >>> import sys >>> db.pq.tracer = sys.stderr.write >>> q.load_rows(chain(open('/Users/jwp/DATA.bcopy', 'rb'), (b'EXTRA',))) ↑ B(25): b'B\x00\x00\x00\x18\x00py:0x1268b30\x00\x00\x00\x00\x00\x00\x00' ↑ E(10): b'E\x00\x00\x00\t\x00\x00\x00\x00\x01' ↑ S(5): b'S\x00\x00\x00\x04' ↓ b'2'(0): b'' ↓ b'G'(5): b'\x01\x00\x01\x00\x01' ↑__(7): b'PGCOPY\n' ↑__(3): b'\xff\r\n' ↑__(41): b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\x01\x00\x01\x00\x00\x00\x04\x00\x00\x00\x02\x00\x01\x00\x00\x00\x04\x00\x00\x00\x03\xff\xff' ↑__(5): b'EXTRA' ↑ c(5): b'c\x00\x00\x00\x04' ↑ S(5): b'S\x00\x00\x00\x04' ↓ b'E'(95): b'SERROR\x00C22P04\x00Mreceived copy data after EOF marker\x00WCOPY data, line 4\x00Fcopy.c\x00L2081\x00RCopyFrom\x00\x00' ↓ b'Z'(1): b'I' Traceback (most recent call last): File "<stdin>", line 1, in <module> <snip> File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/postgresql/driver/pq3.py", line 462,in raise_server_error raise server_error postgresql.exceptions.BadCopyError: received copy data after EOF marker CODE: 22P04 LOCATION: File 'copy.c', line 2081, in CopyFrom from SERVER CONTEXT: COPY data, line 4 STATEMENT: [prepared] sql_parameter_types: [] statement_id: py:0x1268b30 string: copy data FROM STDIN WITH BINARY CONNECTION: [idle] client_address: ::1/128 client_port: 63922 version: PostgreSQL 9.1devel on x86_64-apple-darwin10.4.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc.build 5664), 64-bit CONNECTOR: [Host] pq://jwp:***@localhost:5432/postgres category: None DRIVER: postgresql.driver.pq3.Driver
On Fri, 6 Aug 2010, James William Pye wrote: > On Aug 6, 2010, at 4:31 PM, Kris Jurka wrote: >> <binary-copy-end-v2.patch> > > I think there's a snag in the patch: > > postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY; > ERROR: row field count is -1, expected 1 > CONTEXT: COPY data, line 4 > > Probably a quick/small fix away, I imagine. Hmm, not quite sure why that is. That seems to imply that it's not using V3 protocol, but I thought binary copy could only be used with the V3 protocol. In any case, I think this new patch is more bulletproof. Kris Jurka
Attachment
On Sat, 7 Aug 2010, Kris Jurka wrote: > On Fri, 6 Aug 2010, James William Pye wrote: > >> I think there's a snag in the patch: >> >> postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY; >> ERROR: row field count is -1, expected 1 >> CONTEXT: COPY data, line 4 >> >> Probably a quick/small fix away, I imagine. > > Hmm, not quite sure why that is. That seems to imply that it's not using V3 > protocol, but I thought binary copy could only be used with the V3 protocol. > In any case, I think this new patch is more bulletproof. > Oh, duh. It's a server side copy not going through the client at all. Here's a hopefully final patch. Kris Jurka
Attachment
On Aug 9, 2010, at 11:49 AM, Kris Jurka wrote: > Oh, duh. It's a server side copy not going through the client at all. Here's a hopefully final patch. Trying it out... Works for me. I understand the resistance to the patch, but it would be quite nice to see this wart in the rear view. =\
On Tue, 20 Jul 2010, Matthew Wakeling wrote: > On Tue, 20 Jul 2010, Matthew Wakeling wrote: >> So, should the unlocking happen elsewhere, or should endCopy not require a >> lock? > > Or a third option. We should delay the processing of the "Z" (Ready for > query) message until after the endCopy has been called. > > I have still not been able to create a test that provokes this bug. I do not > understand under what conditions the server will do this, but sometimes it > will respond with a "C" and a "Z" before the endCopy message has actually > been sent to it. Maybe it is keying off the "-1" (no further rows) part of > the binary copy. In any case, I am including a patch here which seems to fix > the driver for my application. When the driver receives a "C" message > (command finished), and block is set to false, then we can assume that a > little later on we will call processCopyResults again very soon with block > set to true. Instead of setting block to true, finish processing of messages > in order to delay seeing the "Z" message until after the endCopy method has > been called. > > The patch also contains a simple test for binary copy in, although it does > not exercise this bug. I've applied a patch to CVS to implement a variant of this that is more selective of what it ignores. It peeks into the protocol stream and ignores all CommandComplete messages while we aren't blocking (which means that the driver thinks it isn't done yet). Kris Jurka
Kris Jurka <books@ejurka.com> writes: >> On Fri, 6 Aug 2010, James William Pye wrote: >>> I think there's a snag in the patch: > Oh, duh. It's a server side copy not going through the client at all. > Here's a hopefully final patch. Applied with a correction: this would've totally broken binary copy in old-style protocol, because there is no other EOF marker except the -1 in that case. BTW, it strikes me that we could reduce the backwards-compatibility impact of this patch if we made it ignore, rather than throw error for, any extra data after the EOF marker. I left it as-is since ISTM the more error checking you can have in a binary data format, the better. But a case could be made for doing the other thing, especially if somebody wanted to argue for back-patching this. regards, tom lane