Thread: Error I don't understand, losing synch with server
Every once in a while I log this error executing a query: message contents do not agree with length in message type "D" lost synchronization with server: got message type "O", length 1398030676 And from that point forward any use of the connection just returns a null result. I'm running 8.0.4 on OS X 10.4.4 Server. Does this look more like a possible bug in PG, or me corrupting memory? For what it's worth, this is currently the only real problem I'm having, no crashes or other weirdness that would lead me to suspect memory corruption in my own code. It's also rare enough that I can work around it by noticing the error, dropping the connection from my pool, and replacing it with a new one. But ugh, that's not exactly a long-term solution. Also FWIW, the only reason I haven't moved to 8.1 is lack of time. (My available time last month got chewed up by a server hardware failure.) -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
Scott Ribe <scott_ribe@killerbytes.com> writes: > Every once in a while I log this error executing a query: > message contents do not agree with length in message type "D" > lost synchronization with server: got message type "O", length 1398030676 This means either that libpq got a corrupt message from the server, or that libpq itself contains a bug in message parsing. Given that no one else has reported similar problems, the idea that your app is somehow clobbering the libpq message buffer (and thus corrupting the message "in transit") has to be taken seriously. You mention pooling so I suppose this is a multi-threaded application ... are you being careful not to let any two threads try to use the same libpq PGconn at the same time? libpq itself does not contain any locking that would make that safe, you need to provide the locking yourself. regards, tom lane
> This means either that libpq got a corrupt message from the server, or > that libpq itself contains a bug in message parsing. Given that no one > else has reported similar problems, the idea that your app is somehow > clobbering the libpq message buffer (and thus corrupting the message "in > transit") has to be taken seriously. Gee. My code corrupting memory. Like that's never happened before ;-) I just had to ask though since I'm not seeing other signs right now. > You mention pooling so I suppose this is a multi-threaded application > ... are you being careful not to let any two threads try to use the same > libpq PGconn at the same time? libpq itself does not contain any > locking that would make that safe, you need to provide the locking > yourself. I have a queue of pgconns. When a thread needs one it pops it off the queue, and when it's done it pushes the pgconn back on, wrapped by a stack-allocated class whose constructor and destructor take care of acquiring and releasing the pgconn. The queue is a Mac OS thing, not my code, so it's not a problem with sharing them, unfortunately. So I'll have to keep looking for memory-munging bugs. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
>> Every once in a while I log this error executing a query: >> message contents do not agree with length in message type "D" >> lost synchronization with server: got message type "O", length 1398030676 > > This means either that libpq got a corrupt message from the server, or > that libpq itself contains a bug in message parsing. Given that no one > else has reported similar problems, the idea that your app is somehow > clobbering the libpq message buffer (and thus corrupting the message "in > transit") has to be taken seriously. > > You mention pooling so I suppose this is a multi-threaded application > ... are you being careful not to let any two threads try to use the same > libpq PGconn at the same time? libpq itself does not contain any > locking that would make that safe, you need to provide the locking > yourself. Uhhhmmm, I built without --enable-thread-safety??? I have a process I follow when building, but pg_config is telling me that I didn't use my standard options. I'm assuming this could cause all sorts of threading kinkiness... -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice