Home > mailing lists

Re: TCP keepalive support for libpq - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: TCP keepalive support for libpq
Date	June 24, 2010 08:54:47
Msg-id	AANLkTikvTfr3FS-L_vlQXLdWwjRT5w6ZoiiPgD47O-LK@mail.gmail.com Whole thread Raw
In response to	Re: TCP keepalive support for libpq ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: TCP keepalive support for libpq
List	pgsql-hackers

Tree view

On Tue, Jun 22, 2010 at 6:04 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> What does bother me is the fact that we are engineering a critical
>> aspect of our system reliability around vendor-specific
>> implementation details of the TCP stack, and that if any version
>> of any operating system that we support (or ever wish to support
>> in the future) fails to have a reliable implementation of this
>> feature AND configurable knobs that we can tune to suit our needs,
>> then we're screwed. Does anyone want to argue that this is NOT a
>> house of cards?
>
> [/me raises hand]
>
> TCP keepalive has been available and a useful part of my reliability
> solutions since I had so find a way to clean up zombie database
> connections caused by clients powering down their workstations
> without closing their apps -- that was in OS/2 circa 1990.

I think the problem is that the above is precisely what TCP keepalives
were designed for -- to prevent connections that are definitely dead
from living on forever. Even then they're controversial and mean
sacrificing a feature that's quite desirable for TCP -- namely that
idle connections don't die unnecessarily in the face of transient
failures and can function fine when the link returns.

The proposed use is for detecting connections which aren't responding
quickly enough for our tastes which might be much more quickly than
TCP timeouts. Because we have a backup plan the conservative option in
our case is to kill the connection as soon as there's any doubt about
it's validity so we can try a new connection. That's just not how TCP
is designed -- the conservative option is assumed to be to keep the
connection open until there's no doubt the connection is dead.

I think it's going to be an uphill battle convincing TCP that we know
better than the TCP spec about how aggressive it should be about
throwing errors and killing connections. Once we have TCP keepalives
set low enough -- assuming the OS will allow it to be set much lower
-- we'll find that other timeouts are longer than we expect too. TCP
Keepalives won't come into it at all if there is any unacked data
pending -- TCP *will* detect that case but it might take longer than
you want too and you won't be able to lower it.

-- 
greg

pgsql-hackers by date:

From: Magnus Hagander
Date: 24 June 2010, 08:41:37
Subject: EOL is when?

From: Michael Meskes
Date: 24 June 2010, 09:15:55
Subject: Re: ECPG FETCH readahead

Re: TCP keepalive support for libpq - Mailing list pgsql-hackers

Previous

Next