Thread: Incorrect calculation of path fraction value in MergeAppend
Hi, Commit 6b94e7a has added a "fractional" strategy to the merge join path search scope. But as I see it incorrectly calculates tuple_fraction value. Let's see one of the regression tests' queries: EXPLAIN (COSTS OFF) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; It uses NestedLoop with parameterised inner scan as a subplan of MergeAppend. Overall cost is: Limit (cost=1.11..4.86 rows=10 width=16) Let's reduce the LIMIT a little: EXPLAIN (COSTS OFF) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 2; There, the query plan uses plain NestLoop node and overall cost significantly increases: Limit (cost=0.56..30.70 rows=2 width=16) The origin problem lies in the following code line: path_fraction = (1.0 / root->tuple_fraction); Building startup, total and fractional MergeAppends, the optimiser puts off the 'fractional' way because it has built for a 50% limit (uses MergeJoin with high startup cost) and chooses a startup-optimal path (built upon plain NestLoops) that is better than the total one. Instead, the path_fraction value must be corrected when root->tuple_fraction contains the absolute value of tuples. See the patch in the attachment. I report this issue into the bugs mailing list because it might cause some 'enigmatic' performance slowdowns I have heard about (but have no proofs) and may need back-patching. -- regards, Andrei Lepikhov
Attachment
Hi Andrei, On Tue, Apr 29, 2025 at 7:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > > Hi, > > Commit 6b94e7a has added a "fractional" strategy to the merge join path > search scope. But as I see it incorrectly calculates tuple_fraction > value. Let's see one of the regression tests' queries: > > EXPLAIN (COSTS OFF) > SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y > USING (id) ORDER BY x.id DESC LIMIT 10; > > It uses NestedLoop with parameterised inner scan as a subplan of > MergeAppend. Overall cost is: > Limit (cost=1.11..4.86 rows=10 width=16) > > Let's reduce the LIMIT a little: > > EXPLAIN (COSTS OFF) > SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y > USING (id) ORDER BY x.id DESC LIMIT 2; > > There, the query plan uses plain NestLoop node and overall cost > significantly increases: > Limit (cost=0.56..30.70 rows=2 width=16) > > The origin problem lies in the following code line: > path_fraction = (1.0 / root->tuple_fraction); > > Building startup, total and fractional MergeAppends, the optimiser puts > off the 'fractional' way because it has built for a 50% limit (uses > MergeJoin with high startup cost) and chooses a startup-optimal path > (built upon plain NestLoops) that is better than the total one. > > Instead, the path_fraction value must be corrected when > root->tuple_fraction contains the absolute value of tuples. See the > patch in the attachment. > > I report this issue into the bugs mailing list because it might cause > some 'enigmatic' performance slowdowns I have heard about (but have no > proofs) and may need back-patching. > > -- > regards, Andrei Lepikhov I've apply the patch and the tests passed, one small nitpick: + double path_fraction = root->tuple_fraction; + + /* Convert absolute limit to a path fraction */ + if (path_fraction >= 1.) + path_fraction /= childrel->rows; maybe add `&& childrel->rows > 0.` to the if condition to avoid potential crash. I'm not sure if `childrel->rows` will be 0 but I see get_cheapest_fractional_path did this. /* Convert absolute # of tuples to a fraction; no need to clamp to 0..1 */ if (tuple_fraction >= 1.0 && best_path->rows > 0) tuple_fraction /= best_path->rows; -- Regards Junwang Zhao
On Thu, May 8, 2025 at 2:42 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > On 7/5/2025 12:45, Álvaro Herrera wrote: > > On 2025-May-07, Andrei Lepikhov wrote: > >> The magic is how it finds a link to the thread. This time it doesn't > > Weird, that doesn't seem to work very well if you try to search for > > stuff. But if you just specify the Message-Id without clicking > > "search" then it works. So here you go: > > https://commitfest.postgresql.org/patch/5742/ > Thanks! I think this memo should be pinned at the top of the page. > > On 7/5/2025 17:35, Junwang Zhao wrote: > > + /* Convert absolute limit to a path fraction */ > > + if (path_fraction >= 1.) > > + path_fraction /= childrel->rows; > > > > maybe add `&& childrel->rows > 0.` to the if condition to > > avoid potential crash. I'm not sure if `childrel->rows` will > > be 0 but I see get_cheapest_fractional_path did this. > You may look at the 76281aa for the reason. Also, looking into the code, > I see that it is impossible now to find more places with rows == 0 > except for the Dummy result node. Moreover, it just doesn't make sense > in the case of append nodes: why should the optimiser spend resources > and consider paths if it is impossible to pull any tuples from the subplan? > > So, I think checking assertion instead would be the better option. See > new version in the attachment. Agree, assertion is better, the new version LGTM. > > -- > regards, Andrei Lepikhov -- Regards Junwang Zhao