Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Eager aggregation, take 3
Date
Msg-id CA+TgmobT156YuP5HUscbR_17nQs=KC703T=SB3VkQMcFbK4Gqw@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Tue, Sep 9, 2025 at 6:30 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > I think it would be worth considering generating the partially grouped
> > relations in a second pass. Right now, as you progress from the bottom
> > of the join tree towards the top, you created grouped rels as you go.
> > But you could equally well finish planning everything up to the
> > scan/join target first and then go back and add grouped_rels to
> > relations where it seems worthwhile.
>
> Hmm, I don't think so.  I think the presence of eager aggregation
> could change the best join order.  For example, without eager
> aggregation, the optimizer might find that (A JOIN B) JOIN C the best
> join order.  But with eager aggregation on B, the optimizer could
> prefer A JOIN (AGG(B) JOIN C).  I'm not sure how we could find the
> best join order with eager aggregation applied without building the
> join tree from the bottom up.

Oh, that is a problem, yes. :-(

> > I haven't done a detailed comparison of generate_grouped_paths() to
> > other parts of the code, but I have an uncomfortable feeling that it
> > might be rather similar to some existing code that probably already
> > exists in multiple, slightly-different versions. Is there any
> > refactoring we could do here?
>
> Yeah, we currently have several functions that do similar, but not
> exactly the same, things.  Maybe some refactoring is possible -- maybe
> not -- I haven't looked into it closely yet.  However, I'd prefer to
> address that in a separate patch if possible, since this issue also
> exists on master, and I want to avoid introducing such changes in this
> already large patch.

Well, it's not just a matter of "this already exists" -- it gets
harder and harder to unify things the more near-copies you add.

> Good point.  I do have manually tested GEQO by setting geqo_threshold
> to 2 and running the regression tests to check for any planning
> errors, crashes, or incorrect results.  However, I'm not sure where
> test cases for GEQO should be added.  I searched the regression tests
> and found only one explicit GEQO test, added back in 2009 (commit
> a43b190e3).  It's not quite clear to me what the current policy is for
> adding GEQO test cases.
>
> Anyway, I will add some test cases in eager_aggregate.sql with
> geqo_threshold set to 2.

Sounds good. I think GEQO is mostly-unmaintained these days, but if
we're updating the code, I think it is good to add tests. Being that
the code is so old, it probably lacks adequate test coverage.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: stray references to SubscriptRef type
Next
From: Aleksander Alekseev
Date:
Subject: Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization