Thread: union vs. sort
I'm surprise with query plan that PostgreSQL planner prepare forselects with ORDER BY if all data are from sub-selectthat is alreadysorted. # explain select data from (select distinct data from addr) as x order by x.data; ------------------------------------------------- Subquery Scanx -> Unique -> Sort Sort Key: data -> Seq Scan onaddr This is right -- the main of query doesn't use "Sort" for ORDER BY,because subselect is sorted by "Unique". And almost same query, but in the subselect is union: # explain select data from (select datafrom addr union select data from addr2) as x orderby x.data; ----------------------------------------- Sort Sort Key: data -> SubqueryScan x -> Unique -> Sort Sort Key: data -> Append -> Subquery Scan "*SELECT* 1" -> Seq Scan on addr -> Subquery Scan "*SELECT* 2" -> Seq Scan on addr2 I think it's bad, because there is used extra sort for ORDER BY foralready by "Unique" sorted data. If I add ORDER BY to subselect: # explain select data from (select data from addr union select data from addr2 order by data) as x order by x.data; --------------------------------------------------- Sort Sort Key: data -> Subquery Scan x -> Sort Sort Key: data -> Unique -> Sort Sort Key: data -> Append -> Subquery Scan "*SELECT* 1" -> Seq Scan on addr -> Subquery Scan "*SELECT* 2" -> Seq Scan onaddr2 I see two unnecessary sorts for unique and already sorted data. The core of problem is probbaly UNION, because if I use simple query without subselect it still sort already sorderd data: # explain select data from addr union select data from addr2 order by data; ----------------------------------- Sort Sort Key: data -> Unique -> Sort Sort Key: data -> Append -> SubqueryScan "*SELECT* 1" -> Seq Scan on addr -> Subquery Scan"*SELECT* 2" -> Seq Scan on addr2 Or order of data which returns "unique" is for UNION diffrent that datafrom DISTINCT? (see first example). Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/
Karel Zak <zakkr@zf.jcu.cz> writes: > I'm surprise with query plan that PostgreSQL planner prepare for > selects with ORDER BY if all data are from sub-select that is already > sorted. This isn't simply a matter of "omitting the sort". Even if the inputs are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4" and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't. To do what you're thinking about, we'd have to build a variant implementation of Unique that merges two presorted inputs --- and it wouldn't work for more than two inputs (at least not without a lot of pain ... Append is a messy special case in many ways, and we'd have to duplicate most of that cruft to make an N-input version of Unique). This is possible, without doubt, but I'm not excited about expending that much time on it. You haven't shown any evidence that this would be an important optimization in practice. regards, tom lane
On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote: > Karel Zak <zakkr@zf.jcu.cz> writes: > > I'm surprise with query plan that PostgreSQL planner prepare for > > selects with ORDER BY if all data are from sub-select that is already > > sorted. > > This isn't simply a matter of "omitting the sort". Even if the inputs > are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4" > and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't. I didn't talk about "Append" result, but about "Unique" result. TheORDER BY in UNION query works with final concanateddata -- that'sright. My question is why a result from this ORDER BY is again sorted: # explain select data from (select data from addr union select data from addr2 order by data) as x order by x.data; ----------------------------------------------- (1) Sort Sort Key: data -> Subquery Scan x (2) -> Sort Sort Key: data -> Unique (3) -> Sort Sort Key: data -> Append -> Subquery Scan "*SELECT* 1" -> Seq Scan on addr -> Subquery Scan "*SELECT* 2" -> Seq Scan on addr2 I see three sorts with same data. > To do what you're thinking about, we'd have to build a variant > implementation of Unique that merges two presorted inputs --- and it > wouldn't work for more than two inputs (at least not without a lot of > pain ... Append is a messy special case in many ways, and we'd have to > duplicate most of that cruft to make an N-input version of Unique). I think it is not needful touch Append, but it should detect redundant sorts. Why "select data from (select data from addrorder by data) as x order by x.data" use only one sort? > This is possible, without doubt, but I'm not excited about expending > that much time on it. You haven't shown any evidence that this would be > an important optimization in practice. It's nothing important for me. It's from Czech databases mailing listwhere some PostgreSQL users was surprised with EXPLAINresult of UNIONand speed of these queries. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/
Karel Zak <zakkr@zf.jcu.cz> writes: > On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote: >> This isn't simply a matter of "omitting the sort". > I didn't talk about "Append" result, but about "Unique" result. The > ORDER BY in UNION query works with final concanated data -- that's > right. My question is why a result from this ORDER BY is again sorted: Oh, okay, that's just something that never got done, per this old comment: /* * We set current_pathkeys NIL indicating we do not know sort * order. This is correct when the topset operation is UNION * ALL, since the appended-together results are unsorted even if * the subplans weresorted. For other set operations we could be * smarter --- room for future improvement! */ I've committed changes to do the right thing in CVS tip. regards, tom lane
On Wed, Apr 07, 2004 at 02:20:55PM -0400, Tom Lane wrote: > > I've committed changes to do the right thing in CVS tip. Thanks man! Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/