Thread: Client failure allows backed to continue

Client failure allows backed to continue

From

Bruce Momjian

Date:

27 January 2003, 18:01:32

As part of the training class I did, some people tested what happens
when the client allocates tons of memory to store a result and aborts.

What we found was that though elog was properly called:
elog(COMMERROR, "pq_recvbuf: recv() failed: %m");

(I think that was the message.)  the backend did not exit and kept
eating CPU. I think the problem is that the elog code only exits on
ERROR, not COMMERROR.  Is there some way to fix this?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Client failure allows backed to continue

From

Tom Lane

Date:

27 January 2003, 22:31:46

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> As part of the training class I did, some people tested what happens
> when the client allocates tons of memory to store a result and aborts.

> What we found was that though elog was properly called:

>     elog(COMMERROR, "pq_recvbuf: recv() failed: %m");

> (I think that was the message.)  the backend did not exit and kept
> eating CPU. I think the problem is that the elog code only exits on
> ERROR, not COMMERROR.  Is there some way to fix this?

There's been talk of setting the QueryCancel flag after detecting a
client communication failure ... but no one has ever done the legwork
to see if that works nicely, or what downsides it might have.
        regards, tom lane

Re: Client failure allows backed to continue

From

Bruce Momjian

Date:

27 January 2003, 22:34:13

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > As part of the training class I did, some people tested what happens
> > when the client allocates tons of memory to store a result and aborts.
> 
> > What we found was that though elog was properly called:
> 
> >     elog(COMMERROR, "pq_recvbuf: recv() failed: %m");
> 
> > (I think that was the message.)  the backend did not exit and kept
> > eating CPU. I think the problem is that the elog code only exits on
> > ERROR, not COMMERROR.  Is there some way to fix this?
> 
> There's been talk of setting the QueryCancel flag after detecting a
> client communication failure ... but no one has ever done the legwork
> to see if that works nicely, or what downsides it might have.

Why is COMMERROR not doing the longjump like ERROR?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Client failure allows backed to continue

From

Tom Lane

Date:

27 January 2003, 22:43:26

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Why is COMMERROR not doing the longjump like ERROR?

Because it's defined to be like LOG.

A more useful reply might be that I'm not sure it's safe to abort in the
client I/O routines.
        regards, tom lane

Re: Client failure allows backed to continue

From

Bruce Momjian

Date:

27 January 2003, 22:45:53

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Why is COMMERROR not doing the longjump like ERROR?
> 
> Because it's defined to be like LOG.
> 
> A more useful reply might be that I'm not sure it's safe to abort in the
> client I/O routines.

Well, if we get an I/O error, I can't imagine why we would continue
doing anything --- are any of those recoverable?  Do we need a separate
error type for I/O messages?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Client failure allows backed to continue

From

Tom Lane

Date:

27 January 2003, 23:21:26

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, if we get an I/O error, I can't imagine why we would continue
> doing anything --- are any of those recoverable?

Well, that's what's not clear --- it's hard to tell if a write failure
is a hard error or just transient.  If we make like elog(ERROR),
returning to the main loop, and then a read from the client *doesn't*
fail, we'll try to continue ... but we've just screwed the pooch,
because we have not sent a complete message and therefore certainly have
messed up frontend/backend synchronization.  I have no idea whether it's
really possible to recover from this situation or not, but that approach
surely won't work.

If you want to take a kamikaze any-comm-error-means-we're-dead approach,
you might think about elog(FATAL).  But that tries to send a message to
the client.  Instant infinite loop, if the error is hard.

Complaints to the postmaster log, and abort at the next safe place
(*not* partway through message output) seem like the way to go to me.

> Do we need a separate error type for I/O messages?

Uh ... see COMMERROR.
        regards, tom lane

Re: Client failure allows backed to continue

From

Bruce Momjian

Date:

27 January 2003, 23:28:15

Well, setting query_cancel then seems like a logical solution because it
will exit at a reasonable point, hopefully.  Right now we have
statement_timeout and that exits at a give time, but I suppose it
doesn't exit while data is transfering, so it may be different.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Well, if we get an I/O error, I can't imagine why we would continue
> > doing anything --- are any of those recoverable?
> 
> Well, that's what's not clear --- it's hard to tell if a write failure
> is a hard error or just transient.  If we make like elog(ERROR),
> returning to the main loop, and then a read from the client *doesn't*
> fail, we'll try to continue ... but we've just screwed the pooch,
> because we have not sent a complete message and therefore certainly have
> messed up frontend/backend synchronization.  I have no idea whether it's
> really possible to recover from this situation or not, but that approach
> surely won't work.
> 
> If you want to take a kamikaze any-comm-error-means-we're-dead approach,
> you might think about elog(FATAL).  But that tries to send a message to
> the client.  Instant infinite loop, if the error is hard.
> 
> Complaints to the postmaster log, and abort at the next safe place
> (*not* partway through message output) seem like the way to go to me.
> 
> > Do we need a separate error type for I/O messages?
> 
> Uh ... see COMMERROR.
> 
>             regards, tom lane
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073