Thread: Optimization of NestLoop join in the case of guaranteed empty innersubtree

Optimization of NestLoop join in the case of guaranteed empty innersubtree

From

Andrey Lepikhov

Date:

11 December 2019, 13:08:40

During NestLoop execution we have bad corner case: if outer subtree 
contains tuples the join node will scan inner subtree even if it does 
not return any tuples.

To reproduce the problem see 'problem.sql' in attachment:
Out of explain analyze see in 'problem_explain.txt'

As you can see, executor scan each of 1e5 outer tuples despite the fact 
that inner can't return any tuples.

Teodor Sigaev and I developed a patch to solve this problem. Result of 
explain analyze procedure can be found in the 'optimized_execution.txt'.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: Optimization of NestLoop join in the case of guaranteed empty inner subtree

From

Tom Lane

Date:

11 December 2019, 15:49:13

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
> During NestLoop execution we have bad corner case: if outer subtree 
> contains tuples the join node will scan inner subtree even if it does 
> not return any tuples.

So the first question about corner-case optimizations like this is always
"how much overhead does it add in the normal case where it fails to gain
anything?".  I see no performance numbers in your proposal.

I do not much like anything about the code, either: as written it's
only helpful for an especially narrow corner case (so narrow that
I wonder if it really ever helps at all: surely calling a nodeMaterial
whose tuplestore is empty doesn't cost much).  But that doesn't stop it
from adding a bool to the generic PlanState struct, with global
implications.  What I'd expected from your text description is that
nodeNestLoop would remember whether its inner child had returned zero rows
the first time, and assume that subsequent executions could be skipped
unless the inner child's parameters change.

            regards, tom lane

Re: Optimization of NestLoop join in the case of guaranteed emptyinner subtree

From

Andrey Lepikhov

Date:

15 December 2019, 10:56:44

On 12/11/19 8:49 PM, Tom Lane wrote:
> Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
>> During NestLoop execution we have bad corner case: if outer subtree
>> contains tuples the join node will scan inner subtree even if it does
>> not return any tuples.
> 
> So the first question about corner-case optimizations like this is always
> "how much overhead does it add in the normal case where it fails to gain
> anything?".  I see no performance numbers in your proposal.

I thought it is trivial. But quick study shows no differences that can 
be seen.

> 
> I do not much like anything about the code, either: as written it's
> only helpful for an especially narrow corner case (so narrow that
> I wonder if it really ever helps at all: surely calling a nodeMaterial
> whose tuplestore is empty doesn't cost much).

Scanning of large outer can be very costly. If you will try to play with 
analytical queries you can find cases, where nested loops uses 
materialization of zero tuples. At least one of the cases for this is 
finding data gaps.
Also, this optimization exists in logic of hash join.

>  But that doesn't stop it
> from adding a bool to the generic PlanState struct, with global
> implications.  What I'd expected from your text description is that
> nodeNestLoop would remember whether its inner child had returned zero rows
> the first time, and assume that subsequent executions could be skipped
> unless the inner child's parameters change.

This note I was waiting for. I agree with you that adding a bool 
variable to PlanState is excessful. See in attachment another version of 
the optimization.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

v2-0001-Skip-scan-of-outer-subtree-if-inner-of-the-NestedLoo.patch