On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> >> The problem appears to manifest when a backend is holding an LWLock and
> >> starting a query, and the planner chooses a parallel plan for the
> >> latter.
>
> > Thanks for the detailed report and for the fix.
>
> I do not have much faith in this patch. It assumes that the
> condition "interrupts can be processed" is the same at plan time and
> execution time. For plans extracted from the plan cache, there seems
> little reason to assume that must be true. The proposed test case
> cannot trigger that (today anyway) because SQL-language functions
> don't deal in cached plans, but I suspect it could be broken with a
> test case using a plpgsql function instead.
Good point. I missed that.
> I actually think that the problem is somewhere upstream of here:
> what in the world are we doing invoking planning and execution of
> arbitrary queries in a context where interrupts are held off?
> That seems certain to fail in multiple ways besides parallel
> workers not working. I think we need to fix whatever code
> believes that that could be OK. And maybe then insert
> "Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
> and executor start.
It would be nice to never run planning or execution with interrupts held. The
concrete examples so far involve btree operator classes with non-C support
functions. How practical would it be to release buffer content locks before
running index support functions? An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor). Non-C support functions have been handy for testing if
nothing else, though. Do non-bundled modules rely on non-C support functions?
Thanks,
nm