Re: Slow count(*) again... - Mailing list pgsql-performance
From | Craig James |
---|---|
Subject | Re: Slow count(*) again... |
Date | |
Msg-id | 4CB34D69.6060108@emolecules.com Whole thread Raw |
In response to | Re: Slow count(*) again... (Scott Marlowe <scott.marlowe@gmail.com>) |
Responses |
Re: Slow count(*) again...
Re: Slow count(*) again... |
List | pgsql-performance |
On 10/9/10 6:47 PM, Scott Marlowe wrote: > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil.whelchel@gmail.com> wrote: >> I know that there haven been many discussions on the slowness of count(*) even >> when an index is involved because the visibility of the rows has to be >> checked. In the past I have seen many suggestions about using triggers and >> tables to keep track of counts and while this works fine in a situation where >> you know what the report is going to be ahead of time, this is simply not an >> option when an unknown WHERE clause is to be used (dynamically generated). >> I ran into a fine example of this when I was searching this mailing list, >> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by >> PostgreSQL 8.3." Obviously at some point count(*) came into play here because >> the site made a list of pages (1 2 3 4 5 6> next). I very commonly make a >> list of pages from search results, and the biggest time killer here is the >> count(*) portion, even worse yet, I sometimes have to hit the database with >> two SELECT statements, one with OFFSET and LIMIT to get the page of results I >> need and another to get the amount of total rows so I can estimate how many >> pages of results are available. The point I am driving at here is that since >> building a list of pages of results is such a common thing to do, there need >> to be some specific high speed ways to do this in one query. Maybe an >> estimate(*) that works like count but gives an answer from the index without >> checking visibility? I am sure that this would be good enough to make a page >> list, it is really no big deal if it errors on the positive side, maybe the >> list of pages has an extra page off the end. I can live with that. What I >> can't live with is taking 13 seconds to get a page of results from 850,000 >> rows in a table. > > 99% of the time in the situations you don't need an exact measure, and > assuming analyze has run recently, select rel_tuples from pg_class for > a given table is more than close enough. I'm sure wrapping that in a > simple estimated_rows() function would be easy enough to do. First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%,that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's probably been the most-asked question I've seen on this forum in the four years I've been here. It's a real problem,and it needs a real solution. I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense? Thanks, Craig
pgsql-performance by date: