libpq connection timeout mismanagement - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | libpq connection timeout mismanagement |
Date | |
Msg-id | 5735.1533828184@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: libpq connection timeout mismanagement
Re: libpq connection timeout mismanagement |
List | pgsql-hackers |
The patch that taught libpq about allowing multiple target hosts modified connectDBComplete() with the intent of making the connect_timeout (if specified) apply per-host, not to the complete connection attempt. It did not do a very good job though, because the timeout only gets reset when connectDBComplete() itself detects a timeout. If PQconnectPoll advances to a new host due to some other cause, the previous host's timeout continues to run, possibly causing a premature timeout failure for the new one. Another thing that I find pretty strange is that it is coded so that, in event of a timeout detection by connectDBComplete, we give up on the current connhost entry and advance to the next host, ignoring any additional addresses we might have for the current hostname. This seems at best poorly thought through. There's no good reason for libpq to assume that all the addresses returned by DNS point at the same machine, or share the same network failure points in between. Attached is an attempt to improve this. I'll park it in the Sept fest. regards, tom lane diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 986f74f..429be74 100644 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *************** connectDBComplete(PGconn *conn) *** 1905,1910 **** --- 1905,1912 ---- PostgresPollingStatusType flag = PGRES_POLLING_WRITING; time_t finish_time = ((time_t) -1); int timeout = 0; + int last_whichhost = -2; /* certainly different from whichhost */ + struct addrinfo *last_addr_cur = NULL; if (conn == NULL || conn->status == CONNECTION_BAD) return 0; *************** connectDBComplete(PGconn *conn) *** 1922,1929 **** */ if (timeout < 2) timeout = 2; - /* calculate the finish time based on start + timeout */ - finish_time = time(NULL) + timeout; } } --- 1924,1929 ---- *************** connectDBComplete(PGconn *conn) *** 1932,1937 **** --- 1932,1952 ---- int ret = 0; /* + * (Re)start the connect_timeout timer if it's active and we are + * considering a different host than we were last time through. If + * we've already succeeded, though, needn't recalculate. + */ + if (flag != PGRES_POLLING_OK && + timeout > 0 && + (conn->whichhost != last_whichhost || + conn->addr_cur != last_addr_cur)) + { + finish_time = time(NULL) + timeout; + last_whichhost = conn->whichhost; + last_addr_cur = conn->addr_cur; + } + + /* * Wait, if necessary. Note that the initial state (just after * PQconnectStart) is to wait for the socket to select for writing. */ *************** connectDBComplete(PGconn *conn) *** 1975,1992 **** if (ret == 1) /* connect_timeout elapsed */ { /* ! * Attempt connection to the next host, ignoring any remaining ! * addresses for the current host. */ ! conn->try_next_addr = false; ! conn->try_next_host = true; conn->status = CONNECTION_NEEDED; - - /* - * Restart the connect_timeout timer for the new host. - */ - if (timeout > 0) - finish_time = time(NULL) + timeout; } /* --- 1990,1999 ---- if (ret == 1) /* connect_timeout elapsed */ { /* ! * Give up on current server/address, try the next one. */ ! conn->try_next_addr = true; conn->status = CONNECTION_NEEDED; } /*
pgsql-hackers by date: