Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers
From | Alexander Pyhalov |
---|---|
Subject | Re: MergeAppend could consider sorting cheapest child path |
Date | |
Msg-id | 96e6b45ca97ac61657ca8c4d65e135f8@postgrespro.ru Whole thread Raw |
In response to | Re: MergeAppend could consider sorting cheapest child path (Andrei Lepikhov <lepihov@gmail.com>) |
Responses |
Re: MergeAppend could consider sorting cheapest child path
|
List | pgsql-hackers |
Andrei Lepikhov писал(а) 2025-04-24 16:01: > On 3/28/25 09:19, Alexander Pyhalov wrote: >> Andy Fan писал(а) 2024-10-17 03:33: >> I've updated patch. One more interesting case which we found - when >> fractional path is selected, it still can be more expensive than >> sorted cheapest total path (as we look only on indexes whith necessary >> pathkeys, not on indexes which allow efficiently fetch data). >> So far couldn't find artificial example, but we've seen inadequate >> index selection due to this issue - instead of using index suited for >> filters in where, index, suitable for sorting was selected as one >> having the cheapest fractional cost. > I think it is necessary to generalise the approach a little. > > Each MergeAppend subpath candidate that fits pathkeys should be > compared to the overall-optimal path + explicit Sort node. Let's label > this two-node composition as base_path. There are three cases exist: > startup-optimal, total-optimal and fractional-optimal. > In the startup case, base_path should use cheapest_startup_path, the > total-optimal case - cheapest_total_path and for a fractional case, we > may employ the get_cheapest_fractional_path routine to detect proper > base_path. > > It may provide a more effective plan either in full, fractional and > partial scan cases: > 1. The Sort node may be pushed down to subpaths under a parallel or > async Append. > 2. When a minor set of subpaths doesn't have a proper index, and it is > profitable to sort them instead of switching to plain Append. > > In general, analysing the regression tests changed by this code, I see > that the cost model prefers a series of small sortings instead of a > single massive one. May be it will show some profit for execution time. > > I am not afraid of any palpable planning overhead here because we just > do cheap cost estimation and comparison operations that don't need > additional memory allocations. The caller is responsible for building a > proper Sort node if this method is chosen as optimal. > > In the attachment, see the patch written according to the idea. There > are I introduced two new routines: > get_cheapest_path_for_pathkeys_ext > get_cheapest_fractional_path_for_pathkeys_ext Hi. I'm a bit confused that get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in STARTUP_COST case looks only on sorting cheapest_startup_path. Usually, sorted cheapest_total_path will be cheaper than sorted fractional/startup path at least by startup cost (as after sorting it includes total_cost of input path). But we ignore this case when selecting cheapest_startup and cheapest_fractional subpaths. As result selected cheapest_startup and cheapest_fractional can be not cheapest for startup or selecting a fraction of rows. Consider the partition with the following access paths: 1) cheapest_startup without required pathkeys: startup_cost: 0.42 total_cost: 4004 2) some index_path with required pathkeys: startup_cost: 6.6 total_cost: 2000 3) cheapest_total_path: startup_cost: 0.42 total_cost: 3.48 Here, when selecting cheapest startup subpath we'll compare costs of index path (2) and sorted cheapest_startup (1), but will ignore sorted cheapest_total_path (3), despite the fact that it really can be the cheapest startup path, providing required sorting order. -- Best regards, Alexander Pyhalov, Postgres Professional
pgsql-hackers by date: