Re: psycopg2 (async) socket timeout - Mailing list psycopg
From | Marko Kreen |
---|---|
Subject | Re: psycopg2 (async) socket timeout |
Date | |
Msg-id | AANLkTi=N4xwp74=QJ8GsjYyXCDsVEF2b+oK9q270-o4Q@mail.gmail.com Whole thread Raw |
In response to | Re: psycopg2 (async) socket timeout (Jan Urbański <wulczer@wulczer.org>) |
Responses |
Re: psycopg2 (async) socket timeout
|
List | psycopg |
On Tue, Feb 15, 2011 at 3:32 PM, Jan Urbański <wulczer@wulczer.org> wrote: > On 15/02/11 06:39, Marko Kreen wrote: >> On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic >> <danny.milo@gmail.com> wrote: >>> is it possible to specify the timeout for the socket underlying a connection? >>> >>> Alternatively, since I'm using the async interface anyway, is it >>> possible proactively cancel a query that is "stuck" since the TCP >>> connection to the database is down? >>> >>> So the specific case is: >>> - connect to the postgres database using psycopg2 while network is up >>> - run some queries, get the results fine etc >>> - send a query >>> - the network goes down before the result to this last query has been received >>> - neither a result nor an error callback gets called - as far as I can >>> see (using txpostgres.ConnectionPool) >>> >>> What's the proper way to deal with that? >> >> TCP keepalive. By default the timeouts are quite high, >> but they are tunable. >> >> libpq supports keepalive tuning since 9.0, on older libpq >> you can do it yourself: >> >> https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153 Keepalive will help to detect if TCP connection is down, it will not help if connection is up but server app is unresponsive. > After doing lots of tests, it seems that keepalives are not the full > solution. They're useful if you want to detect the connection breaking > while it's idle, but they don't help in the case of: > > * the the app sends a keepalive, receives response Sort of true, except Postgres does not have app-level keepalive (except SELECT 1). The PQping mentioned earlier creates new connection. > * the connection is idle > * before the next keepalive is sent, you want to do a query > * the connection breaks silently > * you try sending the query > * libpq tries to write the query to the conncetion socket, does not > receive TCP confirmation The TCP keepalive should help for those cases, perhaps you are doing something wrong if you are not seeing the effect. > * the kernel starts retransmitting the data, using TCP's RTO algorithm > * you don't get notified about the failure until the TCP gives up, which > might be a long time I'm not familiar with RTO, so cannot comment. Why would it stop keepalive from working? > So it seems to me that you need an application-level timeout also. I'm > thinking about supporting it in txpostgres, but will have to think > exactly how to do it and what would be the interface. > > Alternatively, you can lower the kernel TCP retry parameters > (net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2), which will make TCP > give up earlier. Unfortunately it seems that you can only set the > globally at the kernel level and not per connection, which IMHO is a bit > too scary. What bothers me is that the keepalives mechanism does not > come into play while you're doing TCP retries, but that's apparently how > TCP works (at least on Linux...). > > If you want to detect the connection failing as soon as possible, and > not the next time you try to make a query, you need to regularly make > queries, IOW have a heartbeat. But all the things I wrote before still > apply, and without an app-level timeout or lowering the TCP retry > parameters it might take a lot of time to detect that the heartbeat failed. The need for periodic query is exactly the thing that keepalive should fix. OTOH, if you have connections that are long time idle you could simply drop them. We have the (4m idle + 4x15sec ping) parameters as default and they work fine - dead connection is killed after 5m. -- marko