Re: Slow count(*) again... - Mailing list pgsql-performance
From | david@lang.hm |
---|---|
Subject | Re: Slow count(*) again... |
Date | |
Msg-id | alpine.DEB.2.00.1010120118030.21889@asgard.lang.hm Whole thread Raw |
In response to | Re: Slow count(*) again... (Craig Ringer <craig@postnewspapers.com.au>) |
Responses |
Re: Slow count(*) again...
Re: Slow count(*) again... |
List | pgsql-performance |
On Tue, 12 Oct 2010, Craig Ringer wrote: > >> BTW: There is a lot of talk about MVCC, but is next solution possible: >> 1) Create a page information map that for each page in the table will >> tell you how may rows are within and if any write (either successful or >> not) were done to this page. This even can be two maps to make second >> one really small (a bit per page) - so that it could be most time >> in-memory. >> 2) When you need to to count(*) or index check - first check if there >> were no writes to the page. If not - you can use count information from >> page info/index data without going to the page itself >> 3) Let vacuum clear the bit after frozing all the tuples in the page (am >> I using terminology correctly?). > > Part of this already exists. It's called the visibility map, and is present > in 8.4 and above. It's not currently used for queries, but can potentially be > used to aid some kinds of query. > > http://www.postgresql.org/docs/8.4/static/storage-vm.html > >> In this case all read-only (archive) data will be this bit off and >> index/count(*) will be really fast. > > A count with any joins or filter criteria would still have to scan all pages > with visible tuples in them. So the visibility map helps speed up scanning of > bloated tables, but doesn't provide a magical "fast count" except in the > utterly trivial "select count(*) from tablename;" case, and can probably only > be used for accurate results when there are no read/write transactions > currently open. Even if you kept a count of tuples in each page along with > the mvcc transaction ID information required to determine for which > transactions that count is valid, it'd only be useful if you didn't have to > do any condition checks, and it'd be yet another thing to update with every > insert/delete/update. > > Perhaps for some users that'd be worth having, but it seems to me like it'd > have pretty narrow utility. I'm not sure that's the answer. from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. in the case where you are doing a count(*) where query and the where is on an indexed column, could the search just look at the index + the visibility mapping rather than doing an sequential search through the table? as for your worries about the accuracy of a visibility based count in the face of other transactions, wouldn't you run into the same issues if you are doing a sequential scan with the same transactions in process? David Lang
pgsql-performance by date: