Add enable_presorted_aggregate GUC - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Add enable_presorted_aggregate GUC |
Date | |
Msg-id | CAApHDvqzuHerD8zN1Qu=d66e3bp1=9iFn09ZiQ3Zug_Phi6yLQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Add enable_presorted_aggregate GUC
|
List | pgsql-hackers |
Over on [1] Pavel reports that PG16's performance is slower than PG14's for his ORDER BY aggregate query. This seems to be mainly down to how the costing works for Incremental Sort. Now, ever since 1349d279, the planner will make a plan that's ordered by the GROUP BY clause and add on the path key requirements for any ORDER BY aggregates. I had in mind that if someone already had tuned their schema to have a btree index on the GROUP BY clause to allow Group Aggregate to have presorted rows from the index that the planner would likely just take that index and add an Incremental Sort on top of that to obtain the rows in the order required for the aggregate functions. The main reason for Pavel's reported regression is due to a cost "pessimism factor" that was added to incremental sorts where we multiply by 1.5 to reduce the chances of an Incremental Sort plan due to uncertainties around the number of tuples per presorted group. We assume each group will have the same number of tuples. If there's some skew then there will be a larger N factor in the qsort complexity of O(N * log2(N)). Over on [1], I'm proposing that we remove that pessimism factor. If we keep teaching the planner new tricks but cost them pessimistically then we're not taking full advantage of said new tricks. If you disagree with that, then best to raise it on [1]. On [1], in addition to removing the * 1.5 factor, I also propose that we add a new GUC named "enable_presorted_aggregate", which, if turned off will make the planner not request that the plan is also ordered by the aggregate function's ORDER BY / DISTINCT clause. The reason I think that this is required is that even with removing the pessimism factor from incremental sort, it's possible the planner will choose to perform a full sort rather than an incremental sort on top of some existing index which provides presorted input for only the query's GROUP BY clause. e.g. CREATE TABLE ab (a INT, b INT); CREATE INDEX ON ab(a); EXPLAIN SELECT a,array_agg(b ORDER BY b) FROM ab GROUP BY a; Previous to 1349d279, the planner, assuming there's a good amount of rows in the table, would be very likely to use the index for the GROUP BY then the executor would be left to do a sort on "b" within nodeAgg.c. The equivalent of that post-1349d279 is Index Scan -> Incremental Sort -> Group Aggregate (with presorted input). However, the planner could choose to: Seq Scan -> Sort (a,b) -> Group Aggregate (with presorted input). So this leaves an opportunity for the planner to choose a worse plan. Normally we add some enable_* GUC to leave an escape hatch when we add some new feature like this. I likely should have done that when I added 1349d279, but I didn't and I want to now. I mainly just shifted this discussion out of [1] as we normally like to debate GUC names and I feel that the discussion over on the other thread is buried a little too deep to be visible to most people. Can anyone think of a better name? Or does anyone see error with my ambition? David [1] https://www.postgresql.org/message-id/9f61ddbf-2989-1536-b31e-6459370a6baa@postgrespro.ru
Attachment
pgsql-hackers by date: