Home > mailing lists

Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

From	Noah Misch
Subject	Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date	September 18, 2024 06:01:59
Msg-id	20240918030159.2a.nmisch@google.com Whole thread Raw
Responses	Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
List	pgsql-bugs

Tree view

On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> The problem appears to manifest when a backend is holding an LWLock and
> starting a query, and the planner chooses a parallel plan for the
> latter.

Thanks for the detailed report and for the fix.

> Potential fixes
> ---------------
> 
> As an experiment, we modified the planner code to consider the state of
> `InterruptHoldoffCount` when determining the value of
> `glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
> is set to false.
> 
> This ensures a sequential plan is executed if interrupts are being held
> on the leader backend, and the query completes normally.
> 
> The patch is attached as `no_parallel_on_interrupts_held.patch`.

Looks good.  An alternative would be something like the leader periodically
waking up to call HandleParallelMessages() outside of ProcessInterrupts().  I
like your patch better, though.  Parallel query is a lot of infrastructure to
be running while immune to statement_timeout, pg_cancel_backend(), etc.  I
opted to check INTERRUPTS_CAN_BE_PROCESSED(), since QueryCancelHoldoffCount!=0
doesn't cause the hang but still qualifies as a good reason to stay out of
parallel query.  Pushed that way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ac04aa8

> Related issues
> ==============
> 
> -   Query stuck with wait event IPC / ParallelFinish
>     -
> https://www.postgresql.org/message-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel\@j-davis.com

This one didn't reproduce for me.  Like your test, it involves custom code
running inside an opclass.  I'm comfortable assuming it's the same problem.

> -   BUG \#18586: Process (and transaction) is stuck in IPC when the DB
>     is under high load
>     -
> https://www.postgresql.org/message-id/flat/18586-03e1535b1b34db81%40postgresql.org

Here, I'm not seeing enough detail to judge if it's the same.  That's okay.

pgsql-bugs by date:

From: Tom Lane
Date: 17 September 2024, 23:00:02
Subject: Re: PL/pgSQL THEN binging in condition

From: Tom Lane
Date: 18 September 2024, 07:23:42
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

Previous

Next