Thread: union vs. sort

union vs. sort

From

Karel Zak

Date:

06 April 2004, 09:03:55

I'm  surprise  with query  plan  that  PostgreSQL planner  prepare  forselects with ORDER  BY if all data are from
sub-selectthat is alreadysorted.       # explain select data from                         (select distinct data from
addr)                 as x order by x.data;       -------------------------------------------------        Subquery
Scanx          ->  Unique                ->  Sort                      Sort Key: data                      ->  Seq Scan
onaddr
 

This is  right -- the  main of query doesn't  use "Sort" for  ORDER BY,because subselect is sorted by "Unique".

And almost same query, but in the subselect is union:       # explain select data from                         (select
datafrom addr                             union                          select data from addr2)                  as x
orderby x.data;       -----------------------------------------        Sort          Sort Key: data          ->
SubqueryScan x                ->  Unique                      ->  Sort                            Sort Key: data
                   ->  Append                                  ->  Subquery Scan "*SELECT* 1"
             ->  Seq Scan on addr                                  ->  Subquery Scan "*SELECT* 2"
                ->  Seq Scan on addr2
 

  I think  it's bad, because  there is used extra  sort for ORDER  BY foralready by "Unique" sorted data.
If I add ORDER BY to subselect:
       # explain select data from                     (select data from addr                         union
       select data from addr2 order by data)                  as x order by x.data;
---------------------------------------------------       Sort          Sort Key: data          ->  Subquery Scan x
          ->  Sort                      Sort Key: data                      ->  Unique                             ->
Sort                                 Sort Key: data                                  ->  Append
               ->  Subquery Scan "*SELECT* 1"                                              ->  Seq Scan on addr
                               ->  Subquery Scan "*SELECT* 2"                                              ->  Seq Scan
onaddr2 
 

I see two unnecessary sorts for unique and already sorted data.
The core of problem is probbaly UNION, because if I use simple query without subselect it still sort already sorderd
data:
       # explain select data from addr                      union                  select data from addr2
  order by data;       -----------------------------------        Sort          Sort Key: data          ->  Unique
         ->  Sort                      Sort Key: data                      ->  Append                            ->
SubqueryScan "*SELECT* 1"                                  ->  Seq Scan on addr                            ->  Subquery
Scan"*SELECT* 2"                                  ->  Seq Scan on addr2
 

Or order of data which returns "unique" is for UNION diffrent that datafrom DISTINCT? (see first example).
   Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/

Re: union vs. sort

From

Tom Lane

Date:

06 April 2004, 11:32:43

Karel Zak <zakkr@zf.jcu.cz> writes:
>  I'm  surprise  with query  plan  that  PostgreSQL planner  prepare  for
>  selects with ORDER  BY if all data are from  sub-select that is already
>  sorted.

This isn't simply a matter of "omitting the sort".  Even if the inputs
are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.

To do what you're thinking about, we'd have to build a variant
implementation of Unique that merges two presorted inputs --- and it
wouldn't work for more than two inputs (at least not without a lot of
pain ... Append is a messy special case in many ways, and we'd have to
duplicate most of that cruft to make an N-input version of Unique).
This is possible, without doubt, but I'm not excited about expending
that much time on it.  You haven't shown any evidence that this would be
an important optimization in practice.
        regards, tom lane

Re: union vs. sort

From

Karel Zak

Date:

07 April 2004, 04:33:21

On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:
> Karel Zak <zakkr@zf.jcu.cz> writes:
> >  I'm  surprise  with query  plan  that  PostgreSQL planner  prepare  for
> >  selects with ORDER  BY if all data are from  sub-select that is already
> >  sorted.
> 
> This isn't simply a matter of "omitting the sort".  Even if the inputs
> are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
> and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.
I didn't  talk about  "Append" result,  but about  "Unique" result. TheORDER BY  in UNION  query works  with final
concanateddata  -- that'sright. My question is why a result from this ORDER BY is again sorted:
 
       # explain select data from                     (select data from addr                          union
        select data from addr2 order by data)                  as x order by x.data;
-----------------------------------------------
(1)      Sort          Sort Key: data          ->  Subquery Scan x
(2)              ->  Sort                      Sort Key: data                      ->  Unique 
(3)                         ->  Sort                                  Sort Key: data
-> Append                                         ->  Subquery Scan "*SELECT* 1"
     ->  Seq Scan on addr                                         ->  Subquery Scan "*SELECT* 2"
                     ->  Seq Scan on addr2 
 
 I see three sorts with same data.

> To do what you're thinking about, we'd have to build a variant
> implementation of Unique that merges two presorted inputs --- and it
> wouldn't work for more than two inputs (at least not without a lot of
> pain ... Append is a messy special case in many ways, and we'd have to
> duplicate most of that cruft to make an N-input version of Unique).
 I think it is not needful touch Append, but it should detect redundant sorts. Why  "select data from (select data from
addrorder by data) as x order by x.data"
 
 use only one sort?

> This is possible, without doubt, but I'm not excited about expending
> that much time on it.  You haven't shown any evidence that this would be
> an important optimization in practice.
It's nothing important  for me. It's from Czech  databases mailing listwhere some PostgreSQL users was  surprised with
EXPLAINresult of UNIONand speed of these queries.
 
   Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/

Re: union vs. sort

From

Tom Lane

Date:

07 April 2004, 15:21:00

Karel Zak <zakkr@zf.jcu.cz> writes:
> On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:
>> This isn't simply a matter of "omitting the sort".

>  I didn't  talk about  "Append" result,  but about  "Unique" result. The
>  ORDER BY  in UNION  query works  with final  concanated data  -- that's
>  right. My question is why a result from this ORDER BY is again sorted:

Oh, okay, that's just something that never got done, per this old
comment:
       /*        * We set current_pathkeys NIL indicating we do not know sort        * order.  This is correct when the
topset operation is UNION        * ALL, since the appended-together results are unsorted even if        * the subplans
weresorted.  For other set operations we could be        * smarter --- room for future improvement!        */
 

I've committed changes to do the right thing in CVS tip.
        regards, tom lane

Re: union vs. sort

From

Karel Zak

Date:

08 April 2004, 04:00:54

On Wed, Apr 07, 2004 at 02:20:55PM -0400, Tom Lane wrote:
> 
> I've committed changes to do the right thing in CVS tip.
Thanks man!
   Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/