Home > mailing lists

Re: Slow count(*) again... - Mailing list pgsql-performance

From	david@lang.hm
Subject	Re: Slow count(*) again...
Date	October 12, 2010 05:22:50
Msg-id	alpine.DEB.2.00.1010120118030.21889@asgard.lang.hm Whole thread Raw
In response to	Re: Slow count(*) again... (Craig Ringer <craig@postnewspapers.com.au>)
Responses	Re: Slow count() again... Re: Slow count() again...
List	pgsql-performance

Tree view

On Tue, 12 Oct 2010, Craig Ringer wrote:

>
>> BTW: There is a lot of talk about MVCC, but is next solution possible:
>> 1) Create a page information map that for each page in the table will
>> tell you how may rows are within and if any write (either successful or
>> not) were done to this page. This even can be two maps to make second
>> one really small (a bit per page) - so that it could be most time
>> in-memory.
>> 2) When you need to to count(*) or index check - first check if there
>> were no writes to the page. If not - you can use count information from
>> page info/index data without going to the page itself
>> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
>> I using terminology correctly?).
>
> Part of this already exists. It's called the visibility map, and is present
> in 8.4 and above. It's not currently used for queries, but can potentially be
> used to aid some kinds of query.
>
> http://www.postgresql.org/docs/8.4/static/storage-vm.html
>
>> In this case all read-only (archive) data will be this bit off and
>> index/count(*) will be really fast.
>
> A count with any joins or filter criteria would still have to scan all pages
> with visible tuples in them. So the visibility map helps speed up scanning of
> bloated tables, but doesn't provide a magical "fast count" except in the
> utterly trivial "select count(*) from tablename;" case, and can probably only
> be used for accurate results when there are no read/write transactions
> currently open. Even if you kept a count of tuples in each page along with
> the mvcc transaction ID information required to determine for which
> transactions that count is valid, it'd only be useful if you didn't have to
> do any condition checks, and it'd be yet another thing to update with every
> insert/delete/update.
>
> Perhaps for some users that'd be worth having, but it seems to me like it'd
> have pretty narrow utility. I'm not sure that's the answer.

from a PR point of view, speeding up the trivil count(*) case could be
worth it, just to avoid people complaining about it not being fast.

in the case where you are doing a count(*) where query and the where is on
an indexed column, could the search just look at the index + the
visibility mapping rather than doing an sequential search through the
table?

as for your worries about the accuracy of a visibility based count in the
face of other transactions, wouldn't you run into the same issues if you
are doing a sequential scan with the same transactions in process?

David Lang

pgsql-performance by date:

From: Craig Ringer
Date: 12 October 2010, 05:15:21
Subject: Re: Slow count(*) again...

From: Vitalii Tymchyshyn
Date: 12 October 2010, 05:34:30
Subject: Re: Slow count(*) again...

Re: Slow count(*) again... - Mailing list pgsql-performance

Previous

Next