BUG #6342: libpq blocks forever in "poll" function - Mailing list pgsql-bugs
From | andreagrassi@sogeasoft.com |
---|---|
Subject | BUG #6342: libpq blocks forever in "poll" function |
Date | |
Msg-id | E1RbSUA-0003kd-Tb@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #6342: libpq blocks forever in "poll" function
Re: BUG #6342: libpq blocks forever in "poll" function |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 6342 Logged by: Andrea Grassi Email address: andreagrassi@sogeasoft.com PostgreSQL version: 8.4.8 Operating system: SUSE SLES 10 SP4 64 BIT Description:=20=20=20=20=20=20=20=20 Hi,=20 I have a big and strange problem. Sometimes, libpq remains blocked in =E2= =80=9Cpoll=E2=80=9D function even if the server has already answered to the query. If I attach to the process using kdbg I found this stack: __kernel_vsyscall() poll() from /lib/libc.so.6 pqSocketCheck() from /home/pg/pgsql/lib-32/libpq.so.5 pqWaitTimed() from /home/pg/pgsql/lib-32/libpq.so.5 pqWait() from /home/pg/pgsql/lib-32/libpq.so.5 PQgetResult() from /home/pg/pgsql/lib-32/libpq.so.5 PQexecFinish() from /home/pg/pgsql/lib-32/libpq.so.5 =E2=80=A6 To simplify the context and to reproduce the bug, I wrote a test program (that I attach below) that uses only libpq interface (no other strange libraries) to read my database at localhost.=20 It loop on a table of 64000 rows and for each row it reads another table. Generally it take 1 minute to work. I put this program in a loop, so once it finishes, it restarts.=20 Usually it works fine but sometimes (without any rule) it blocks. It blocks always (with the stack above) executing PQexec function (=E2=80=9CCLOSE CUR= SOR xx=E2=80=9D or =E2=80=9CFETCH ALL IN xx=E2=80=9D). If I press =E2=80=9Ccontinue=E2=80=9D on kdbg after attaching the process, = the programs continue its execution and exit with success. Here the specifics of the platform (a SLES 10 SP4 64-bit WITHOUT any VMWARE) Server HP DL 580 G7 4 CPU INTEL XEON X7550 64 GB RAM 8 HD 600GB SAS DP 6G 2,5=E2=80=9D RAID 1 e RAID5 S.O.=20 SUSE SLES 10 SP4 64 BIT Kernel=20 Linux linuxspanesi 2.6.16.60-0.85.1-smp #1 SMP Thu Mar 17 11:45:06 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Server Postgres=20 8.4.8 - 64-bit=20 Libpq 8.4.8 =E2=80=93 32-bit=20 I try to recompile libpq in=20 - debug mode - on a 64-bit machine with =E2=80=93m32 option - on a 32-bit machine=20 - setting HAVE_POLL to false at line 1053 in fe-misc.c to force to execute the other branch of =E2=80=9C#ifdef/else=E2=80=9D using the function =E2=80= =9Cselect()=E2=80=9D instead of =E2=80=9Cpoll()=E2=80=9D but none fixes the bug. I had the same stack as above, except for the last case in which I had =E2=80=9C___newselect_nocancel()=E2=80=9D instead of = =E2=80=9Cpoll()=E2=80=9D. If I check the state of the connection using the =E2=80=9Cnetstat=E2=80=9D = command I get this output: tcp 24 0 127.0.0.1:49007 127.0.0.1:5432=20=20=20=20= =20=20=20 ESTABLISHED 17415/pq_example.e where the second field (recv-Q) is always blocked to a non-zero value. It seems as the server has already answered but the libpq or poll function don=E2=80=99t realize it.=20 Consider that the machine is very good and very fast. It seems that the answer of the server arrives before the libpq starts waiting for it (calling poll). Could be ?=20 I try to install a VMware this the same version of Linux and same version of the kernel on a machine much less powerful: my program works fine and never blocks. Here below the code of the example program: /* * testlibpq.c * * Test the C version of libpq, the PostgreSQL frontend library. */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include "libpq-fe.h" static void exit_nicely(PGconn *conn) { PQfinish(conn); exit(1); } int main(int argc, char **argv) { const char *conninfo; PGconn *conn; PGresult *res; int i, j; /* * If the user supplies a parameter on the command line, use it as the * conninfo string; otherwise default to setting dbname=3Dpostgres and using * environment variables or defaults for all other connection parameters. */ /* Make a connection to the database */ #ifdef CASE1 conn =3D PQsetdbLogin( getenv("SQLSERVER"), // pghost 0, // pgport 0, // pgoptions 0, // pgtty "OSA", // dbName 0, // login 0 // pwd ); #else conn =3D PQconnectdb("dbname =3D OSA"); #endif /* Check to see that the backend connection was successfully made */ if (PQstatus(conn) !=3D CONNECTION_OK) { fprintf(stderr, "Connection to database failed: %s", PQerrorMessage(conn)); exit_nicely(conn); } res =3D PQexec (conn, "SET datestyle=3D'ISO'"); switch (PQresultStatus (res)) { case PGRES_BAD_RESPONSE: case PGRES_NONFATAL_ERROR: case PGRES_FATAL_ERROR: fprintf(stderr, "SET DATESTYLE command failed: %s", PQresultErrorMessage(res)); break; } PQclear(res); /* * Our test case here involves using a cursor, for which we must be inside * a transaction block. We could do the whole thing with a single * PQexec() of "select * from pg_database", but that's too trivial to make * a good example. */ /* Start a transaction block */ res =3D PQexec(conn, "BEGIN"); if (PQresultStatus(res) !=3D PGRES_COMMAND_OK) { fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } /* * Should PQclear PGresult whenever it is no longer needed to avoid memory * leaks */ PQclear(res); /* * Fetch rows from pg_database, the system catalog of databases */ res =3D PQexec(conn, "DECLARE articoli CURSOR FOR select cdart from base_a_artico ORDER BY cdart"); if (PQresultStatus(res) !=3D PGRES_COMMAND_OK) { fprintf(stderr, "DECLARE CURSOR failed: %s", PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } PQclear(res); res =3D PQexec(conn, "FETCH ALL in articoli"); if (PQresultStatus(res) !=3D PGRES_TUPLES_OK) { fprintf(stderr, "FETCH ALL failed: %s", PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } /* next, print out the rows */ for (i =3D 0; i < PQntuples(res); i++) { read_rigpia(conn, PQgetvalue(res, i, 0)); } PQclear(res); /* close the portal ... we don't bother to check for errors ... */ res =3D PQexec(conn, "CLOSE articoli"); PQclear(res); /* end the transaction */ res =3D PQexec(conn, "END"); PQclear(res); /* close the connection to the database and cleanup */ PQfinish(conn); return 0; } int read_rigpia(PGconn* conn, char* cdart) { PGresult *res; char sql[1024]; int i; char* dtfab; char* sum; memset(sql,0,sizeof(sql)); sprintf(sql,"DECLARE rigpia CURSOR FOR select dtfab,sum(qtfan-qtpro) from adp_d_rigpia where flsta=3D'' and cdart=3D'%s' and qtfan>qtpro and cdd= pu not in ('04','05','06','07','08','09', '91','92','93','94','95','96','97','98','A0','B8','C2','LF','SC') group by dtfab", cdart); res =3D PQexec(conn, sql);=20 if (PQresultStatus(res) !=3D PGRES_COMMAND_OK) { fprintf(stderr, "DECLARE CURSOR rigpia failed: %s *** %s", PQerrorMessage(conn),sql); PQclear(res); return 0;=20 } PQclear(res); res =3D PQexec(conn, "FETCH ALL in rigpia"); if (PQresultStatus(res) !=3D PGRES_TUPLES_OK) { fprintf(stderr, "FETCH ALL failed in rigpia: %s", PQerrorMessage(conn)); PQclear(res); return 0; } /* next, print out the rows */ for (i =3D 0; i < PQntuples(res); i++) { dtfab =3D PQgetvalue(res, i, 0); sum =3D PQgetvalue(res, i, 1); } PQclear(res); res =3D PQexec(conn, "CLOSE rigpia"); PQclear(res); } Regards,=20 Andrea=20
pgsql-bugs by date: