Thread: Incorrect calculation of path fraction value in MergeAppend

Incorrect calculation of path fraction value in MergeAppend

From
Andrei Lepikhov
Date:
Hi,

Commit 6b94e7a has added a "fractional" strategy to the merge join path 
search scope. But as I see it incorrectly calculates tuple_fraction 
value. Let's see one of the regression tests' queries:

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
   USING (id) ORDER BY x.id DESC LIMIT 10;

It uses NestedLoop with parameterised inner scan as a subplan of 
MergeAppend. Overall cost is:
Limit  (cost=1.11..4.86 rows=10 width=16)

Let's reduce the LIMIT a little:

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
   USING (id) ORDER BY x.id DESC LIMIT 2;

There, the query plan uses plain NestLoop node and overall cost 
significantly increases:
Limit  (cost=0.56..30.70 rows=2 width=16)

The origin problem lies in the following code line:
    path_fraction = (1.0 / root->tuple_fraction);

Building startup, total and fractional MergeAppends, the optimiser puts 
off the 'fractional' way because it has built for a 50% limit (uses 
MergeJoin with high startup cost) and chooses a startup-optimal path 
(built upon plain NestLoops) that is better than the total one.

Instead, the path_fraction value must be corrected when 
root->tuple_fraction contains the absolute value of tuples. See the 
patch in the attachment.

I report this issue into the bugs mailing list because it might cause 
some 'enigmatic' performance slowdowns I have heard about (but have no 
proofs) and may need back-patching.

-- 
regards, Andrei Lepikhov

Attachment

Re: Incorrect calculation of path fraction value in MergeAppend

From
Junwang Zhao
Date:
Hi Andrei,

On Tue, Apr 29, 2025 at 7:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> Hi,
>
> Commit 6b94e7a has added a "fractional" strategy to the merge join path
> search scope. But as I see it incorrectly calculates tuple_fraction
> value. Let's see one of the regression tests' queries:
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
>    USING (id) ORDER BY x.id DESC LIMIT 10;
>
> It uses NestedLoop with parameterised inner scan as a subplan of
> MergeAppend. Overall cost is:
> Limit  (cost=1.11..4.86 rows=10 width=16)
>
> Let's reduce the LIMIT a little:
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
>    USING (id) ORDER BY x.id DESC LIMIT 2;
>
> There, the query plan uses plain NestLoop node and overall cost
> significantly increases:
> Limit  (cost=0.56..30.70 rows=2 width=16)
>
> The origin problem lies in the following code line:
>         path_fraction = (1.0 / root->tuple_fraction);
>
> Building startup, total and fractional MergeAppends, the optimiser puts
> off the 'fractional' way because it has built for a 50% limit (uses
> MergeJoin with high startup cost) and chooses a startup-optimal path
> (built upon plain NestLoops) that is better than the total one.
>
> Instead, the path_fraction value must be corrected when
> root->tuple_fraction contains the absolute value of tuples. See the
> patch in the attachment.
>
> I report this issue into the bugs mailing list because it might cause
> some 'enigmatic' performance slowdowns I have heard about (but have no
> proofs) and may need back-patching.
>
> --
> regards, Andrei Lepikhov

I've apply the patch and the tests passed, one small nitpick:

+ double path_fraction = root->tuple_fraction;
+
+ /* Convert absolute limit to a path fraction */
+ if (path_fraction >= 1.)
+ path_fraction /= childrel->rows;

maybe add `&& childrel->rows > 0.` to the if condition to
avoid potential crash. I'm not sure if `childrel->rows` will
be 0 but I see get_cheapest_fractional_path did this.

/* Convert absolute # of tuples to a fraction; no need to clamp to 0..1 */
if (tuple_fraction >= 1.0 && best_path->rows > 0)
tuple_fraction /= best_path->rows;

--
Regards
Junwang Zhao



Re: Incorrect calculation of path fraction value in MergeAppend

From
Junwang Zhao
Date:
On Thu, May 8, 2025 at 2:42 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 7/5/2025 12:45, Álvaro Herrera wrote:
>  > On 2025-May-07, Andrei Lepikhov wrote:
>  >> The magic is how it finds a link to the thread. This time it doesn't
>  > Weird, that doesn't seem to work very well if you try to search for
>  > stuff.  But if you just specify the Message-Id without clicking
>  > "search" then it works.  So here you go:
>  > https://commitfest.postgresql.org/patch/5742/
> Thanks! I think this memo should be pinned at the top of the page.
>
> On 7/5/2025 17:35, Junwang Zhao wrote:
> > + /* Convert absolute limit to a path fraction */
> > + if (path_fraction >= 1.)
> > + path_fraction /= childrel->rows;
> >
> > maybe add `&& childrel->rows > 0.` to the if condition to
> > avoid potential crash. I'm not sure if `childrel->rows` will
> > be 0 but I see get_cheapest_fractional_path did this.
> You may look at the 76281aa for the reason. Also, looking into the code,
> I see that it is impossible now to find more places with rows == 0
> except for the Dummy result node. Moreover, it just doesn't make sense
> in the case of append nodes: why should the optimiser spend resources
> and consider paths if it is impossible to pull any tuples from the subplan?
>
> So, I think checking assertion instead would be the better option. See
> new version in the attachment.

Agree, assertion is better, the new version LGTM.

>
> --
> regards, Andrei Lepikhov



--
Regards
Junwang Zhao



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Personally I can see getting this in 18beta2 without too much additional
> arguing for it.  But for 15-17 you'd need a lot more support than is
> evident in this thread.

+1.  The bar for causing plan changes in back branches is very high.
Unless you can show that the buggy behavior is causing wrong answers,
it's usually best to leave it alone.

            regards, tom lane