Home > mailing lists

Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers

From	David Geier
Subject	Re: Performance issues with parallelism and LIMIT
Date	November 26, 2025 11:15:35
Msg-id	efc01fab-8dbd-4eab-aca2-05c31952d183@gmail.com Whole thread Raw
In response to	Re: Performance issues with parallelism and LIMIT (Tomas Vondra <tomas@vondra.me>)
Responses	Re: Performance issues with parallelism and LIMIT Re: Performance issues with parallelism and LIMIT
List	pgsql-hackers

Tree view

On 19.11.2025 21:03, Tomas Vondra wrote:

> Right, that's why I suggested to have a function the nodes would call in
> suitable places.
> 
>>>> I like that idea, even though it would still not work while a node is
>>>> doing the crunching. That is after it has pulled all rows and before it
>>>> can return the first row. During this time the node won't call
>>>> ExecProcNode().
>>>>
>>> True. Perhaps we could provide a function nodes could call in suitable
>>> places to check whether to end?
>> This function would then also be required by the base relation scans.
>> And we would have to call it more or less in the same places
>> CHECK_FOR_INTERRUPTS() is called today.
>>
> 
> Yes, but I don't think CHECK_FOR_INTERRUPTS() would be a good place to
> manipulate the executor state. Maybe you could do some magic with
> siglongjmp(), but I have "funny" feeling about that - I wouldn't be
> surprised if that interfered with elog(), which is the only other place
> using siglongjmp() AFAICS.
You had the right intuition. siglongjmp-ing out leaves behind per-node
instrumentation state and CurrentMemoryContext in an unexpected state.

Example: jumping out of the executor, after we've called
InstrStartNode(), but before we call InstrStopNode(). Subsequently
calling InstrEndLoop() will give the error you encountered. A similar
problem exists for CurrentMemoryContext which is not properly reset.

I didn't encounter these issues during my testing because they're
largely timing dependent. Execution can end before the other workers
have started executing. So the stopping logic didn't kick in.

Both issues can be accounted for when jumping out but this seems
somewhat hacky.

> Which is why I suggested maybe it should be handled in execProcnode
> (which would take care of cases where we produce a tuple), and then let
> nodes to optionally check the flag too (through a new function).
> 
> I haven't tried doing this, so maybe I'm missing something ...
> 
>> Beyond that, code such as heap_nextslot() or index_getnext_slot() don't
>> have access to the PlanState which would contain the stop flag. So that
>> would have to be propagated downwards as well.
>>
>> All of that would make for a fairly big patch, which the initial patch
>> avoids.
This turned out to be false. See below.

> Right. I don't think we can set the flag in plan/executor state, because
> that's not available in signal handler / ProcessInterrupts() ... It'd
> need to be a global variable, I guess.
What we can do is use a global variable. That also makes checking the
flag a lot easier because we don't need to pass it around through
multiple abstraction layers.

What needs to be taken care of though is to only bail from scans that
are actually initiated from plan nodes. There are many places in the
code that use e.g. the table AM API directly. We don't want to bail from
these scans. Without flagging if a scan should bail or not, e.g.
ScanPgRelation() will return no tuple and therefore relcache code starts
failing.

The new patch accounts for that by introducing a new TableScanDescData
flag SO_OBEY_PARALLEL_WORKER_STOP, which indicates if the scan should
adhere to the parallel worker stop or not.

Stopping is broadcasted to all parallel workers via SendProcSignal().
The stop variable is checked with the new
CHECK_FOR_PARALLEL_WORKER_STOP() macro.

In the PoC implementation I've for now only changed nodeSeqScan.c. If
there's agreement to pursue this approach, I'll change the other places
as well. Naming can also likely be still improved.

--
David Geier

Attachment

v2-0001-Parallel-workers-stop-quicker.patch

pgsql-hackers by date:

From: John Naylor
Date: 26 November 2025, 11:00:06
Subject: Re: [buildfarm related] Machines gcc experimental failed test_lfind

From: Heikki Linnakangas
Date: 26 November 2025, 11:20:40
Subject: Re: Bug in amcheck?

Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers

Attachment

Previous

Next