Thread: Ranked Rather Than Ordered

Ranked Rather Than Ordered

From

Berkowitz Eric

Date:

14 May 2009, 14:25:17

When postgresql implements the following query:

Select * from <table> where <condition> order by <ordinal expression>
limit <X>

It appears to do a select, then a sort, then return the top X rows.

This works fine for small results but not for tables with tens of
millions of rows and queries that may return tens of thousands or even
hundreds of thousands of rows.

The sort is superfluous and incredibly expensive.

What should be done on this query is to do the select  saving X rows
in a save-bucket that is ranked by  the ordinal expression.

In fact, I currently find it far faster to do a simple select with no
limit or ordering and then in an external application do a single pass
over the resultset using an efficient save-bucket to keep the X
highest ranked rows.

Is there any way to get postgresql to perform this query efficiently
-- single-pass with ranking and an efficient save-bucket, no sorting,
or are there any plans for such an extension?

Thanks


Eric

Re: Ranked Rather Than Ordered

From

Heikki Linnakangas

Date:

14 May 2009, 14:42:23

(This doesn't belong on the pgadmin-hackers list, but here goes anyway..)

Berkowitz Eric wrote:
> When postgresql implements the following query:
>
> Select * from <table> where <condition> order by <ordinal expression>
> limit <X>
>
> It appears to do a select, then a sort, then return the top X rows.
>
> This works fine for small results but not for tables with tens of
> millions of rows and queries that may return tens of thousands or even
> hundreds of thousands of rows.
>
> The sort is superfluous and incredibly expensive.
>
> What should be done on this query is to do the select  saving X rows in
> a save-bucket that is ranked by  the ordinal expression.

Starting with version 8.3, the server can do just that. It's implemented
within the Sort node, but you can tell by looking at the EXPLAIN ANALYZE
output if that optimization has taken effect:

postgres=# explain analyze SELECT * FROM foo ORDER BY a LIMIT 10;
                                                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------
  Limit  (cost=7.16..7.19 rows=10 width=2) (actual time=0.581..0.625
rows=10 loops=1)
    ->  Sort  (cost=7.16..7.41 rows=100 width=2) (actual
time=0.577..0.592 rows=10 loops=1)
          Sort Key: a
          Sort Method:  top-N heapsort  Memory: 17kB
          ->  Seq Scan on foo  (cost=0.00..5.00 rows=100 width=2)
(actual time=0.013..0.207 rows=103 loops=1)
  Total runtime: 0.694 ms
(6 rows)

The "top-N heapsort" is exactly what you're looking for.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com