Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | CAApHDvowcPrNY3kgAPhbFL0nZ1fCvUBcp3y1AjLm0TUP5N3cxw@mail.gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
On Wed, 11 Jan 2023 at 08:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > > > On 10/01/23 10:53, David Rowley wrote: > > > the total cost is the same for both of these, but the execution time > > seems to vary quite a bit. > > This is really weird, I tried it different ways (to rule out any issues > due to > > caching) and execution time varied in spite of having same cost. > > > Maybe we should try and do this for DISTINCT queries if the > > distinct_pathkeys match the orderby_pathkeys. That seems a little less > > copout-ish. If the ORDER BY is the same as the DISTINCT then it seems > > likely that the ORDER BY might opt to use the Unique path for DISTINCT > > since it'll already have the correct pathkeys. > > > However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and > > then sort all over again, especially so when the DISTINCT removes a > > large proportion of the rows. > > Isn't order by pathkeys are always fewer than distinct pathkeys? Just thinking about this again, I remembered why I thought DISTINCT was uninteresting to start with. The problem is that if the query has WindowFuncs and also has a DISTINCT clause, then the WindowFunc results *must* be in the DISTINCT clause and, optionally also in the ORDER BY clause. There's no other place to write WindowFuncs IIRC. Since we cannot pushdown the sort when the more strict version of the pathkeys have WindowFuncs, then we must still perform the additional sort if the planner chooses to do a non-hashed DISTINCT. The aim of this patch is to reduce the total number of sorts, and I don't think that's possible in this case as you can't have WindowFuncs in the ORDER BY when they're not in the DISTINCT clause: postgres=# select distinct relname from pg_Class order by row_number() over (order by oid); ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: select distinct relname from pg_Class order by row_number() ... Another type of query which is suboptimal still is when there's a DISTINCT and WindowClause but no ORDER BY. We'll reorder the DISTINCT clause so that the leading columns of the ORDER BY come first in transformDistinctClause(), but we've nothing to do the same for WindowClauses. It can't happen around when transformDistinctClause() is called as we've yet to decide the WindowClause evaluation order, so if we were to try to make that better it would maybe have to do in the upper planner somewhere. It's possible it's too late by that time to adjust the DISTINCT clause. Here's an example of it. # set enable_hashagg=0; # explain select distinct relname,relkind,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------ Unique (cost=61.12..65.24 rows=412 width=73) -> Sort (cost=61.12..62.15 rows=412 width=73) Sort Key: relname, relkind, (count(*) OVER (?)) -> WindowAgg (cost=36.01..43.22 rows=412 width=73) -> Sort (cost=36.01..37.04 rows=412 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65) (7 rows) We can simulate the optimisation by swapping the column order in the targetlist. Note the planner can use Incremental Sort (at least since 3c6fc5820, from about 2 hours ago) # explain select distinct relkind,relname,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------ Unique (cost=41.26..65.32 rows=412 width=73) -> Incremental Sort (cost=41.26..62.23 rows=412 width=73) Sort Key: relkind, relname, (count(*) OVER (?)) Presorted Key: relkind -> WindowAgg (cost=36.01..43.22 rows=412 width=73) -> Sort (cost=36.01..37.04 rows=412 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65) (8 rows) Not sure if we should be trying to improve that in this patch. I just wanted to identify it as something else that perhaps could be done. I'm not really all that sure the above query shape makes much sense in the real world. Would anyone ever want to use DISTINCT on some results containing WindowFuncs? David
pgsql-hackers by date: