Home > mailing lists

Add enable_presorted_aggregate GUC - Mailing list pgsql-hackers

From	David Rowley
Subject	Add enable_presorted_aggregate GUC
Date	December 15, 2022 23:47:51
Msg-id	CAApHDvqzuHerD8zN1Qu=d66e3bp1=9iFn09ZiQ3Zug_Phi6yLQ@mail.gmail.com Whole thread
Responses	Re: Add enable_presorted_aggregate GUC
List	pgsql-hackers

Tree view

Over on [1] Pavel reports that PG16's performance is slower than
PG14's for his ORDER BY aggregate query. This seems to be mainly down
to how the costing works for Incremental Sort. Now, ever since
1349d279, the planner will make a plan that's ordered by the GROUP BY
clause and add on the path key requirements for any ORDER BY
aggregates. I had in mind that if someone already had tuned their
schema to have a btree index on the GROUP BY clause to allow Group
Aggregate to have presorted rows from the index that the planner would
likely just take that index and add an Incremental Sort on top of that
to obtain the rows in the order required for the aggregate functions.

The main reason for Pavel's reported regression is due to a cost
"pessimism factor" that was added to incremental sorts where we
multiply by 1.5 to reduce the chances of an Incremental Sort plan due
to uncertainties around the number of tuples per presorted group. We
assume each group will have the same number of tuples. If there's some
skew then there will be a larger N factor in the qsort complexity of
O(N * log2(N)).

Over on [1], I'm proposing that we remove that pessimism factor. If we
keep teaching the planner new tricks but cost them pessimistically
then we're not taking full advantage of said new tricks. If you
disagree with that, then best to raise it on [1].

On [1], in addition to removing the * 1.5 factor, I also propose that
we add a new GUC named "enable_presorted_aggregate", which, if turned
off will make the planner not request that the plan is also ordered by
the aggregate function's ORDER BY / DISTINCT clause. The reason I
think that this is required is that even with removing the pessimism
factor from incremental sort, it's possible the planner will choose to
perform a full sort rather than an incremental sort on top of some
existing index which provides presorted input for only the query's
GROUP BY clause. e.g.

CREATE TABLE ab (a INT, b INT);
CREATE INDEX ON ab(a);

EXPLAIN SELECT a,array_agg(b ORDER BY b) FROM ab GROUP BY a;

Previous to 1349d279, the planner, assuming there's a good amount of
rows in the table, would be very likely to use the index for the GROUP
BY then the executor would be left to do a sort on "b" within
nodeAgg.c. The equivalent of that post-1349d279 is Index Scan ->
Incremental Sort -> Group Aggregate (with presorted input). However,
the planner could choose to: Seq Scan -> Sort (a,b) -> Group Aggregate
(with presorted input). So this leaves an opportunity for the planner
to choose a worse plan.

Normally we add some enable_* GUC to leave an escape hatch when we add
some new feature like this. I likely should have done that when I
added 1349d279, but I didn't and I want to now.

I mainly just shifted this discussion out of [1] as we normally like
to debate GUC names and I feel that the discussion over on the other
thread is buried a little too deep to be visible to most people.

Can anyone think of a better name? Or does anyone see error with my ambition?

David

[1] https://www.postgresql.org/message-id/9f61ddbf-2989-1536-b31e-6459370a6baa@postgrespro.ru

Attachment

enable_presorted_aggregate.patch

pgsql-hackers by date:

From: Joseph Koshakow
Date: 15 December 2022, 23:43:29
Subject: Re: Infinite Interval

From: Tom Lane
Date: 16 December 2022, 00:12:07
Subject: Re: Exclusion constraints on partitioned tables

Add enable_presorted_aggregate GUC - Mailing list pgsql-hackers

Attachment

Previous

Next