Re: Socket read fails - Mailing list pgsql-jdbc
From | David Hustace |
---|---|
Subject | Re: Socket read fails |
Date | |
Msg-id | A174A100-FBCC-4513-8D5F-0FF5C2AF961B@opennms.org Whole thread Raw |
In response to | Re: Socket read fails (Colin Taylor <colin.taylor@gmail.com>) |
Responses |
Re: Socket read fails
|
List | pgsql-jdbc |
On Jan 25, 2006, at 1:03 AM, Colin Taylor wrote: > For the benefit of anyone else who has this problem, it seems to > be alleviated, at least in our case by rebooting the box every > night. Not ideal by any means but better than having the > application hang. Which kind of suggests some resource issue, > investing with linux sar command though doesnt reveal any process > or file limits being reached. > We have been fighting this very same problem with our open source project (OpenNMS) for a several months now but only on systems with fairly large databases (~=2.5 GB). We recently built 8.1.2 on one customer's beefy Sun system (Sparc and Solaris 10) hoping to resolve the issue only to find it happening now more frequently now as well. 8.1.2's performance, while running, is much better btw. We have been through our database code over and over and we sure that we are closing all connections, statements, etc. though that doesn't seem to be the issue since our hangs on socketRead0 happen on a call to DriverManger.getConnection() (creating new connection) which appears to be different for your hang. Here is a recent stack trace: "OpenNMS.Poller.DefaultPollContext" prio=10 tid=0x006df638 nid=0xa5 runnable [0x3cffe000..0x3cfffbf0] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java: 218) at java.io.BufferedInputStream.read(BufferedInputStream.java: 235) - locked <0x60fa45e8> (a java.io.BufferedInputStream) at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:254) at org.postgresql.core.v2.QueryExecutorImpl.processResults (QueryExecutorImpl.java:408) at org.postgresql.core.v2.QueryExecutorImpl.execute (QueryExecutorImpl.java:364) at org.postgresql.core.v2.QueryExecutorImpl.execute (QueryExecutorImpl.java:258) - locked <0x60fa4608> (a org.postgresql.core.v2.QueryExecutorImpl) at org.postgresql.core.v2.ConnectionFactoryImpl.runSetupQuery (ConnectionFactoryImpl.java:405) at org.postgresql.core.v2.ConnectionFactoryImpl.runInitialQueries (ConnectionFactoryImpl.java:423) at org.postgresql.core.v2.ConnectionFactoryImpl.openConnectionImpl (ConnectionFactoryImpl.java:83) at org.postgresql.core.ConnectionFactory.openConnection (ConnectionFactory.java:65) at org.postgresql.jdbc2.AbstractJdbc2Connection.<init> (AbstractJdbc2Connection.java:116) at org.postgresql.jdbc3.AbstractJdbc3Connection.<init> (AbstractJdbc3Connection.java:30) at org.postgresql.jdbc3.Jdbc3Connection.<init> (Jdbc3Connection.java:24) at org.postgresql.Driver.makeConnection(Driver.java:369) at org.postgresql.Driver.connect(Driver.java:245) at java.sql.DriverManager.getConnection(DriverManager.java:525) - locked <0x48cd17e8> (a java.lang.Class) at java.sql.DriverManager.getConnection(DriverManager.java:171) - locked <0x48cd17e8> (a java.lang.Class) at org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection (DatabaseConnectionFactory.java:1125) at org.opennms.netmgt.utils.JDBCTemplate.doExecute (JDBCTemplate.java:102) at org.opennms.netmgt.utils.JDBCTemplate.execute (JDBCTemplate.java:82) at org.opennms.netmgt.poller.DefaultQueryManager.resolveOutage (DefaultQueryManager.java:531) at org.opennms.netmgt.poller.DefaultPollContext$2.run (DefaultPollContext.java:139) at org.opennms.netmgt.poller.pollables.PendingPollEvent.processPending (PendingPollEvent.java:94) at org.opennms.netmgt.poller.DefaultPollContext.onEvent (DefaultPollContext.java:184) - locked <0x5a373ed0> (a java.util.LinkedList) at org.opennms.netmgt.eventd.EventIpcManagerDefaultImpl $ListenerThread.run(EventIpcManagerDefaultImpl.java:169) at java.lang.Thread.run(Thread.java:595) This past summer, we had someone try to debug this on the system level and here is what he reported: ----------- begin debugging ------------------ Thread Dump showing thread who has locked DatabaseConnectionFactory: "EventQueueProcessor" prio=5 tid=0x008262d0 nid=0x3b runnable [239ff000..239ffc28] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java: 183) at java.io.BufferedInputStream.read(BufferedInputStream.java: 201) - locked <0x8183b8a8> (a java.io.BufferedInputStream) at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:166) at org.postgresql.core.QueryExecutor.executeV3 (QueryExecutor.java:127) - locked <0x8183b8c8> (a org.postgresql.core.PGStream) at org.postgresql.core.QueryExecutor.execute (QueryExecutor.java:100) at org.postgresql.core.QueryExecutor.execute (QueryExecutor.java:43) at org.postgresql.jdbc1.AbstractJdbc1Connection.execSQL (AbstractJdbc1Connection.java:887) at org.postgresql.jdbc1.AbstractJdbc1Connection.openConnectionV3 (AbstractJdbc1Connection.java:505) at org.postgresql.jdbc1.AbstractJdbc1Connection.openConnection (AbstractJdbc1Connection.java:214) at org.postgresql.Driver.connect(Driver.java:139) at java.sql.DriverManager.getConnection(DriverManager.java:512) - locked <0xf1b8f260> (a java.lang.Class) at java.sql.DriverManager.getConnection(DriverManager.java:171) - locked <0xf1b8f260> (a java.lang.Class) at org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection (DatabaseConnectionFactory.java:1120) - locked <0x328b2758> (a java.util.LinkedList) at org.opennms.netmgt.xmlrpcd.XmlRpcNotifier.getNodeLabel (XmlRpcNotifier.java:355) pstack showing the FD of java->pgsql socket read call: ----------------- lwp# 59 / thread# 59 -------------------- ff33d2d8 read (36, 939a18, 2000) feccbc30 JVM_Read (36, 939a18, 2000, feccb5c8, ff364278, ff36c950) + 68 30edac88 Java_java_net_SocketInputStream_socketRead0 (826364, 239ff100, 239ff0fc, 239ff0f8, 0, 2000) + 1e8 pfiles showing port number of the java->psql socket read call: 54: S_IFSOCK mode:0666 dev:304,0 ino:16062 uid:0 gid:0 size:0 O_RDWR SOCK_STREAM SO_SNDBUF(49152),SO_RCVBUF(49152) sockname: AF_INET 127.0.0.1 port: 50912 peername: AF_INET 127.0.0.1 port: 5432 ...after greping through the pfiles output of 17 postmaster processes, found the one that has socket 50912 open: 8: S_IFSOCK mode:0666 dev:304,0 ino:13881 uid:0 gid:0 size:0 O_RDWR SOCK_STREAM SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152) sockname: AF_INET 127.0.0.1 port: 5432 peername: AF_INET 127.0.0.1 port: 50912 pstack of this postmaster process: 3029: /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data fefbca1c recv (8, 283df0, 2000, 0) 000f7420 pq_getbyte (285c00, 328598, 1, 6, 5cbd8, 5cc30) + 38 00156a38 SocketBackend (ffbff1d8, 49, 0, 49, 0, 2) + 4 00156d28 ReadCommand (ffbff1d8, 0, ffffffff, fffffff8, 0, 2ad800) + 20 00159c94 PostgresMain (4, 2c3850, 2c3830, 6, 0, 0) + a7c 0012e414 BackendRun (2d2038, 1084, 916e0, 0, ff1d2000, 1000) + 5b8 0012dc60 BackendStartup (2d2038, 2b3b0c, 0, 0, 2b3800, 5) + 150 0012bfdc ServerLoop (2, 2b3800, 2b3800, ffbffb98, ffbffc38, 286000) + 304 0012b7c8 PostmasterMain (0, 2bffe8, 2c7aa0, 2bf000, 2be400, 23f000) + b10 000f89cc main (3, 2bffe8, ffbffe2c, 2be6dc, ff1d0140, ff1d0180) + 220 00039d44 _start (0, 0, 0, 0, 0, 0) + 5c Conclusion java is doing a read on this socket, and postmaster is doing a recv. ----------end debugging--------------------- We are considering the use of DataSource instead of the deprecated DriverManager, since DriverManager is synchronized. Any thoughts on the system level debugging above or the use of the DataSource Interface? Note that we use very many threads, however, our connection pool keeps the number of connection down to around 12 with peaks for 20 or 25. -David David Hustace The OpenNMS Group, Inc. Main : +1 919 545 2553 Fax: +1 503-961-7746 Direct: +1 919 827 1201 Skype: dhustace Key Fingerprint: 8EC1 F319 2C1D 3197 AAB9 580F FACD 1D19 175E D903
pgsql-jdbc by date: