Thread: Ranked Rather Than Ordered
When postgresql implements the following query: Select * from <table> where <condition> order by <ordinal expression> limit <X> It appears to do a select, then a sort, then return the top X rows. This works fine for small results but not for tables with tens of millions of rows and queries that may return tens of thousands or even hundreds of thousands of rows. The sort is superfluous and incredibly expensive. What should be done on this query is to do the select saving X rows in a save-bucket that is ranked by the ordinal expression. In fact, I currently find it far faster to do a simple select with no limit or ordering and then in an external application do a single pass over the resultset using an efficient save-bucket to keep the X highest ranked rows. Is there any way to get postgresql to perform this query efficiently -- single-pass with ranking and an efficient save-bucket, no sorting, or are there any plans for such an extension? Thanks Eric
(This doesn't belong on the pgadmin-hackers list, but here goes anyway..) Berkowitz Eric wrote: > When postgresql implements the following query: > > Select * from <table> where <condition> order by <ordinal expression> > limit <X> > > It appears to do a select, then a sort, then return the top X rows. > > This works fine for small results but not for tables with tens of > millions of rows and queries that may return tens of thousands or even > hundreds of thousands of rows. > > The sort is superfluous and incredibly expensive. > > What should be done on this query is to do the select saving X rows in > a save-bucket that is ranked by the ordinal expression. Starting with version 8.3, the server can do just that. It's implemented within the Sort node, but you can tell by looking at the EXPLAIN ANALYZE output if that optimization has taken effect: postgres=# explain analyze SELECT * FROM foo ORDER BY a LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Limit (cost=7.16..7.19 rows=10 width=2) (actual time=0.581..0.625 rows=10 loops=1) -> Sort (cost=7.16..7.41 rows=100 width=2) (actual time=0.577..0.592 rows=10 loops=1) Sort Key: a Sort Method: top-N heapsort Memory: 17kB -> Seq Scan on foo (cost=0.00..5.00 rows=100 width=2) (actual time=0.013..0.207 rows=103 loops=1) Total runtime: 0.694 ms (6 rows) The "top-N heapsort" is exactly what you're looking for. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com