Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect - Mailing list pgsql-committers
| From | Fujii Masao |
|---|---|
| Subject | Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect |
| Date | |
| Msg-id | 53c3deb8-666d-fc4c-f0b5-e8c397bc6250@oss.nttdata.com Whole thread Raw |
| In response to | Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
|
| List | pgsql-committers |
On 2020/10/11 9:16, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>>> Therefore, the easy fix is to make libpq mark the connection as
>>>> CONNECTION_BAD even in ECONNABORTED, like we do in ECONNRESET.
>
> So in the wake of commit fe27009cb,
Many thanks for the commit fe27009cb !!
> this is what lorikeet is doing:
>
> @@ -9028,9 +9028,7 @@
> CALL terminate_backend_and_wait('fdw_retry_check');
> SAVEPOINT s;
> SELECT 1 FROM ft1 LIMIT 1; -- should fail
> -ERROR: server closed the connection unexpectedly
> - This probably means the server terminated abnormally
> - before or while processing the request.
> +ERROR: could not receive data from server: Software caused connection abort
> CONTEXT: remote SQL command: SAVEPOINT s2
> COMMIT;
> -- Clean up
>
> which is better --- the connection recovery is working --- but
> obviously still not a "pass".
>
> The only way to make this test pass as-is is to hack libpq so that
> it deems ECONNABORTED to be a server crash, which frankly I do not
> think is a good change. I don't see any principled reason to think
> that it means that rather than a network connectivity failure.
>
> What I've done instead as a stopgap is to adjust the test case to be
> insensitive to the exact error message text.
For now I've not come up with better idea than current this fix.
> Maybe there's a better
> way, but I can't think of anything besides having two (or more?)
> expected-output files. That would be quite unpalatable as things
> stand, though perhaps we could make it tolerable by splitting this
> test off into a second test script.
>
> Meanwhile, now that I've looked at commit 32a9c0bdf, I'm not very
> happy with it:
>
> * The control flow seems rather forced. I think it was designed
> on the assumption that reindenting the existing code is forbidden.
> Why not use an actual loop, instead of a goto? I also think that
> it's far from obvious that the loop isn't an infinite loop; it
> took me quite a while to glom onto the fact that the test inside the
> PG_CATCH avoids that by checking retry_conn. Maybe a comment would
> be enough to improve that, but I feel the control logic could stand
> a rethink.
Isn't it simpler and easier-to-read to just reestablish new connection again
in the retry case instead of using a loop because we retry only once?
Patch attached.
>
> * The CATCH case is completely failing to clean up after itself.
> At the minimum, there has to be a FlushErrorState() there.
> I don't think it'd be terribly hard to drive this code to an
> error-stack-overflow PANIC.
You're right. Sorry I forgot to take into consideration this :(
I fixed this issue in the attached patch.
>
> * As is so often true of proposed patches in which PG_CATCH is
> thought to be an adequate way to catch an error, this is just
> unbelievably fragile. It will break, and badly so, if it catches
> an error that is anything other than what it is expecting ...
> and it's not even particularly trying to verify that the error is
> what it's expecting. It might be okay, or at least a little bit
> harder to break, if it verified that the error's SQLSTATE is
> ERRCODE_CONNECTION_FAILURE.
"PQstatus()==CONNECTION_BAD" was checked for that purpose. But that's not enough?
Anyway, in the patch, I changed the code so that sqlstate==ERRCODE_CONNECTION_FAILURE
is checked.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment
pgsql-committers by date: