On Sat, Oct 5, 2024 at 6:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > Here is an updated version of this patch that fixes the rowcount
> > estimate issue along this routine. (see set_joinpath_size.)
>
> I have worked on inventing some heuristics to limit the planning
> effort of eager aggregation. One simple yet effective approach I'm
> thinking of is to consider a grouped path as NOT useful if its row
> reduction ratio falls below a predefined minimum threshold. Currently
> I'm using 0.5 as the threshold, but I'm open to other values.
I ran the TPC-DS benchmark at scale 10 and observed eager aggregation
applied in several queries, including q4, q8, q11, q23, q31, q33, and
q77. Notably, the regression in q19 that Tender identified with v11
has disappeared in v13.
Here’s a comparison of Execution Time and Planning Time for the seven
queries with eager aggregation disabled versus enabled (best of 3).
Execution Time:
EAGER-AGG-OFF EAGER-AGG-ON
q4 105787.963 ms 34807.938 ms
q8 1407.454 ms 1654.923 ms
q11 67899.213 ms 18670.086 ms
q23 45945.849 ms 42990.652 ms
q31 10463.536 ms 10244.175 ms
q33 2186.928 ms 2217.228 ms
q77 2360.565 ms 2416.674 ms
Planning Time:
EAGER-AGG-OFF EAGER-AGG-ON
q4 2.334 ms 2.602 ms
q8 0.685 ms 0.647 ms
q11 0.935 ms 1.094 ms
q23 2.666 ms 2.582 ms
q31 1.051 ms 1.206 ms
q33 1.248 ms 1.796 ms
q77 0.967 ms 0.962 ms
There are good performance improvements in q4 and q11 (3~4 times).
For the other queries, execution times remain largely unchanged,
falling within the margin of error, with no notable regressions
observed.
For the planning time, I do not see notable regressions for any of the
seven queries.
It seems that the new cost estimates and the new heuristic are working
pretty well.
Thanks
Richard