Re: CLUSTER and indisclustered - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: CLUSTER and indisclustered |
Date | |
Msg-id | 12593.1028695307@sss.pgh.pa.us Whole thread Raw |
In response to | Re: CLUSTER and indisclustered (Curt Sampson <cjs@cynic.net>) |
Responses |
Re: CLUSTER and indisclustered
Re: CLUSTER and indisclustered |
List | pgsql-hackers |
Curt Sampson <cjs@cynic.net> writes: > But after doing some benchmarking of various sorts of random reads > and writes, it occurred to me that there might be optimizations > that could help a lot with this sort of thing. What if, when we've > got an index block with a bunch of entries, instead of doing the > reads in the order of the entries, we do them in the order of the > blocks the entries point to? I thought to myself "didn't I just post something about that?" and then realized it was on a different mailing list. Here ya go (and no, this is not the first time around on this list either...) I am currently thinking that bitmap indexes per se are not all that interesting. What does interest me is bitmapped index lookup, which came back into mind after hearing Ann Harrison describe how FireBird/ InterBase does it. The idea is that you don't scan the index and base table concurrently as we presently do it. Instead, you scan the index and make a list of the TIDs of the table tuples you need to visit. This list can be conveniently represented as a sparse bitmap. After you've finished looking at the index, you visit all the required table tuples *in physical order* using the bitmap. This eliminates multiple fetches of the same heap page, and can possibly let you get some win from sequential access. Once you have built this mechanism, you can then move on to using multiple indexes in interesting ways: you can do several indexscans in one query and then AND or OR their bitmaps before doing the heap scan. This would allow, for example, "WHERE a = foo and b = bar" to be handled by ANDing results from separate indexes on the a and b columns, rather than having to choose only one index to use as we do now. Some thoughts about implementation: FireBird's implementation seems to depend on an assumption about a fixed number of tuple pointers per page. We don't have that, but we could probably get away with just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page. Also, the main downside of this approach is that the bitmap could get large --- but you could have some logic that causes you to fall back to plain sequential scan if you get too many index hits. (It's interesting to think of this as lossy compression of the bitmap... which leads to the idea of only being fuzzy in limited areas of the bitmap, rather than losing all the information you have.) A possibly nasty issue is that lazy VACUUM has some assumptions in it about indexscans holding pins on index pages --- that's what prevents it from removing heap tuples that a concurrent indexscan is just about to visit. It might be that there is no problem: even if lazy VACUUM removes a heap tuple and someone else then installs a new tuple in that same TID slot, you should be okay because the new tuple is too new to pass your visibility test. But I'm not convinced this is safe. regards, tom lane
pgsql-hackers by date: