Hi, Andrei!
On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 9/12/24 16:57, Tomas Vondra wrote:
> > On 9/12/24 12:12, David Rowley wrote:
> >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>> Initial problem causes wrong cost_sort estimation. Right now I think
> >>> about providing cost_sort() the sort clauses instead of (or in addition
> >>> to) the pathkeys.
> >>
> >> I'm not quite sure why the sort clauses matter any more than the
> >> EquivalenceClass. If the EquivalanceClass defines that all members
> >> will have the same value for any given row, then, if we had to choose
> >> any single member to drive the n_distinct estimate from, isn't the
> >> most accurate distinct estimate from the member with the smallest
> >> n_distinct estimate? (That assumes the less distinct member has every
> >> value the more distinct member has, which might not be true)
> > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > to do things differently. And I agree looking for the minimum ndistinct
> > seems like the right approach. but doesn't estimate_num_groups()
> > supposed to already do that? The comment says:
> Hi,
>
> I have rewritten the code according to the idea of the
> estimate_num_groups responsibility to adjust estimation according to the EC.
> I haven't changed all the places of ngroups estimation - only where the
> planner can access pathkeys to avoid the overhead of passing through all
> the ECs. And added some cache in the EM.
> The most important case for me here is GROUP-BY estimation and
> create_unique_path because I frequently see them as sides of a JOIN.
> It would be also nice to adjust the cost of memoize rescan, but it may
> be a subject of the future improvement.
Thank you for your work on this subject. I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's. I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0". Should we do this in cost_incremental_sort() as before? I see
that it currently analyzes only first EquivalenceMember. But could it
be a different answers between first EquivalenceMember and others?
------
Regards,
Alexander Korotkov
Supabase