Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: MergeAppend could consider sorting cheapest child path
Date
Msg-id ef50ac7b-d04b-462d-930a-114ae77ce7b2@gmail.com
Whole thread Raw
In response to Re: MergeAppend could consider sorting cheapest child path  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Responses Re: MergeAppend could consider sorting cheapest child path
List pgsql-hackers
On 3/28/25 09:19, Alexander Pyhalov wrote:
> Andy Fan писал(а) 2024-10-17 03:33:
> I've updated patch. One more interesting case which we found - when 
> fractional path is selected, it still can be more expensive than sorted 
> cheapest total path (as we look only on indexes whith necessary 
> pathkeys, not on indexes which allow efficiently fetch data).
> So far couldn't find artificial example, but we've seen inadequate index 
> selection due to this issue - instead of using index suited for filters 
> in where, index, suitable for sorting was selected as one having the 
> cheapest fractional cost.
I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be compared 
to the overall-optimal path + explicit Sort node. Let's label this 
two-node composition as base_path. There are three cases exist: 
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the 
total-optimal case - cheapest_total_path and for a fractional case, we 
may employ the get_cheapest_fractional_path routine to detect proper 
base_path.

It may provide a more effective plan either in full, fractional and 
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or 
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is 
profitable to sort them instead of switching to plain Append.

In general, analysing the regression tests changed by this code, I see 
that the cost model prefers a series of small sortings instead of a 
single massive one. May be it will show some profit for execution time.

I am not afraid of any palpable planning overhead here because we just 
do cheap cost estimation and comparison operations that don't need 
additional memory allocations. The caller is responsible for building a 
proper Sort node if this method is chosen as optimal.

In the attachment, see the patch written according to the idea. There 
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

I have designed the code that way to reduce duplicated code in the 
generate_orderedappend_paths routine. But the main point is that I 
envision these new routines may be reused elsewhere.

This feature looks сlose to the one we discussed before [1]. So, let me 
CC the people from the previous conversation to the discussion.

[1] 
https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com

-- 
regards, Andrei Lepikhov
Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade-breaking release
Next
From: Christoph Berg
Date:
Subject: Re: vacuumdb --missing-stats-only and pg_upgrade from PG13