Home > mailing lists

Re: Incremental Sort Cost Estimation Instability - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Incremental Sort Cost Estimation Instability
Date	June 3, 2025 17:02:53
Msg-id	CAPpHfduossatgSzBzTm-ENnzhiGEj-xE=pk8MK79XR2cm9=vDg@mail.gmail.com Whole thread
In response to	Incremental Sort Cost Estimation Instability (Andrei Lepikhov <lepihov@gmail.com>)
Responses	Re: Incremental Sort Cost Estimation Instability
List	pgsql-hackers

Tree view

Hi, Andrei!

On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 9/12/24 16:57, Tomas Vondra wrote:
> > On 9/12/24 12:12, David Rowley wrote:
> >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>> Initial problem causes wrong cost_sort estimation. Right now I think
> >>> about providing cost_sort() the sort clauses instead of (or in addition
> >>> to) the pathkeys.
> >>
> >> I'm not quite sure why the sort clauses matter any more than the
> >> EquivalenceClass.  If the EquivalanceClass defines that all members
> >> will have the same value for any given row, then, if we had to choose
> >> any single member to drive the n_distinct estimate from, isn't the
> >> most accurate distinct estimate from the member with the smallest
> >> n_distinct estimate?  (That assumes the less distinct member has every
> >> value the more distinct member has, which might not be true)
> > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > to do things differently. And I agree looking for the minimum ndistinct
> > seems like the right approach. but doesn't estimate_num_groups()
> > supposed to already do that? The comment says:
> Hi,
>
> I have rewritten the code according to the idea of the
> estimate_num_groups responsibility to adjust estimation according to the EC.
> I haven't changed all the places of ngroups estimation - only where the
> planner can access pathkeys to avoid the overhead of passing through all
> the ECs. And added some cache in the EM.
> The most important case for me here is GROUP-BY estimation and
> create_unique_path because I frequently see them as sides of a JOIN.
> It would be also nice to adjust the cost of memoize rescan, but it may
> be a subject of the future improvement.

Thank you for your work on this subject.  I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's.  I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0".  Should we do this in cost_incremental_sort() as before?  I see
that it currently analyzes only first EquivalenceMember.  But could it
be a different answers between first EquivalenceMember and others?

------
Regards,
Alexander Korotkov
Supabase

pgsql-hackers by date:

From: Tom Lane
Date: 03 June 2025, 17:01:58
Subject: Re: C11 / VS 2019

From: Alexander Korotkov
Date: 03 June 2025, 17:05:13
Subject: Re: MergeAppend could consider sorting cheapest child path

Re: Incremental Sort Cost Estimation Instability - Mailing list pgsql-hackers

Previous

Next