Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw |
Date | |
Msg-id | CA+TgmobBcxuawpYcfbeRiyX_xn3VUK+ppwJG=Amg=nTpzRe6QA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw
|
List | pgsql-hackers |
On Fri, May 5, 2017 at 11:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I am not saying to rip those changes. Your most of the mail stresses > about the 30-second timeout which you have added in the patch to > detect timeouts during failures (rollback processing). I have no > objection to that part of the patch except that still there is a race > condition around PQsendQuery and you indicate that it is better to > deal it as a separate patch to which I agree. OK. > The point of which I am > not in total agreement is about the failure other than the time out. > > +pgfdw_exec_cleanup_query(PGconn *conn, const char *query) > { > .. > + /* Get the result of the query. */ > + if (pgfdw_get_cleanup_result(conn, endtime, &result)) > + return false; > + > + /* Issue a warning if not successful. */ > + if (PQresultStatus(result) != PGRES_COMMAND_OK) > + { > + pgfdw_report_error(WARNING, result, conn, true, query); > + return false; > + } > .. > } > > In the above code, if there is a failure due to timeout then it will > return from first statement (pgfdw_get_cleanup_result), however, if > there is any other failure, then it will issue a warning and return > false. I am not sure if there is a need to return false in the second > case, because even without that it will work fine (and there is a > chance that it won't drop the connection), but maybe your point is > better to be consistent for all kind of error handling in this path. There are three possible queries that can be executed by pgfdw_exec_cleanup_query; let's consider them individually. 1. ABORT TRANSACTION. If ABORT TRANSACTION fails, I think that we have no alternative but to close the connection. If we don't, then the next local connection that tries to use this connection will probably also fail, because it will find the connection inside an aborted transaction rather than not in a transaction. If we do close the connection, the next transaction will try to reconnect and everything will be fine. 2. DEALLOCATE ALL. If DEALLOCATE ALL fails, we do not have to close the connection, but the reason why we're running that statement in the first place is because we've potentially lost track of which statements are prepared on the remote side. If we don't drop the connection, we'll still be confused about that. Reconnecting will fix it. 3. ROLLBACK TO SAVEPOINT %d; RELEASE SAVEPOINT %d. If this fails, the local toplevel transaction is doomed. It would be wrong to try to commit on the remote side, because there could be remote changes made by the aborted subtransaction which must not survive, and the ROLLBACK TO SAVEPOINT %d command is essential in order for us to get rid of them, and that has already failed. So we must eventually abort the remote transaction, which we can do either by closing the connection or by trying to execute ABORT TRANSACTION. The former, however, is quick, and the latter is very possibly going to involve a long hang, so closing the connection seems better. In all of these cases, the failure of one of these statements seems to me to *probably* mean that the remote side is just dead, in which case dropping the connection is the only option. But even if the remote side is still alive and the statement failed for some other reason -- say somebody changed kwlist.h so that ABORT now has to be spelled ABURT -- returning true rather than false just causes us to end up with a remote connection whose state is out of step with the local server state, and such a connection is not going to be usable. I think part of your point is that in case 3, we might still get back to a usable connection if ABORT TRANSACTION is successfully executed later, but that's not very likely and, even if it would have happened, reconnecting instead doesn't really leave us any worse off. > No, I am not saying any of those. I hope the previous point clarifies > what I want to say. Apologies, but I still cannot see quite what you are getting at. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: