Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

From Noah Misch
Subject Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date
Msg-id 20240918050434.13.nmisch@google.com
Whole thread Raw
In response to Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
List pgsql-bugs
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> >> The problem appears to manifest when a backend is holding an LWLock and
> >> starting a query, and the planner chooses a parallel plan for the
> >> latter.
> 
> > Thanks for the detailed report and for the fix.
> 
> I do not have much faith in this patch.  It assumes that the
> condition "interrupts can be processed" is the same at plan time and
> execution time.  For plans extracted from the plan cache, there seems
> little reason to assume that must be true.  The proposed test case
> cannot trigger that (today anyway) because SQL-language functions
> don't deal in cached plans, but I suspect it could be broken with a
> test case using a plpgsql function instead.

Good point.  I missed that.

> I actually think that the problem is somewhere upstream of here:
> what in the world are we doing invoking planning and execution of
> arbitrary queries in a context where interrupts are held off?
> That seems certain to fail in multiple ways besides parallel
> workers not working.  I think we need to fix whatever code
> believes that that could be OK.  And maybe then insert
> "Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
> and executor start.

It would be nice to never run planning or execution with interrupts held.  The
concrete examples so far involve btree operator classes with non-C support
functions.  How practical would it be to release buffer content locks before
running index support functions?  An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor).  Non-C support functions have been handy for testing if
nothing else, though.  Do non-bundled modules rely on non-C support functions?

Thanks,
nm



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Next
From: Tom Lane
Date:
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start