On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> The problem appears to manifest when a backend is holding an LWLock and
> starting a query, and the planner chooses a parallel plan for the
> latter.
Thanks for the detailed report and for the fix.
> Potential fixes
> ---------------
>
> As an experiment, we modified the planner code to consider the state of
> `InterruptHoldoffCount` when determining the value of
> `glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
> is set to false.
>
> This ensures a sequential plan is executed if interrupts are being held
> on the leader backend, and the query completes normally.
>
> The patch is attached as `no_parallel_on_interrupts_held.patch`.
Looks good. An alternative would be something like the leader periodically
waking up to call HandleParallelMessages() outside of ProcessInterrupts(). I
like your patch better, though. Parallel query is a lot of infrastructure to
be running while immune to statement_timeout, pg_cancel_backend(), etc. I
opted to check INTERRUPTS_CAN_BE_PROCESSED(), since QueryCancelHoldoffCount!=0
doesn't cause the hang but still qualifies as a good reason to stay out of
parallel query. Pushed that way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ac04aa8
> Related issues
> ==============
>
> - Query stuck with wait event IPC / ParallelFinish
> -
> https://www.postgresql.org/message-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel\@j-davis.com
This one didn't reproduce for me. Like your test, it involves custom code
running inside an opclass. I'm comfortable assuming it's the same problem.
> - BUG \#18586: Process (and transaction) is stuck in IPC when the DB
> is under high load
> -
> https://www.postgresql.org/message-id/flat/18586-03e1535b1b34db81%40postgresql.org
Here, I'm not seeing enough detail to judge if it's the same. That's okay.