Home > mailing lists

Re: CLUSTER and indisclustered - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: CLUSTER and indisclustered
Date	August 7, 2002 00:42:05
Msg-id	12593.1028695307@sss.pgh.pa.us Whole thread Raw
In response to	Re: CLUSTER and indisclustered (Curt Sampson <cjs@cynic.net>)
Responses	Re: CLUSTER and indisclustered Re: CLUSTER and indisclustered
List	pgsql-hackers

Tree view

Curt Sampson <cjs@cynic.net> writes:
> But after doing some benchmarking of various sorts of random reads
> and writes, it occurred to me that there might be optimizations
> that could help a lot with this sort of thing. What if, when we've
> got an index block with a bunch of entries, instead of doing the
> reads in the order of the entries, we do them in the order of the
> blocks the entries point to?

I thought to myself "didn't I just post something about that?"
and then realized it was on a different mailing list.  Here ya go
(and no, this is not the first time around on this list either...)

I am currently thinking that bitmap indexes per se are not all that
interesting.  What does interest me is bitmapped index lookup, which
came back into mind after hearing Ann Harrison describe how FireBird/
InterBase does it.

The idea is that you don't scan the index and base table concurrently
as we presently do it.  Instead, you scan the index and make a list
of the TIDs of the table tuples you need to visit.  This list can
be conveniently represented as a sparse bitmap.  After you've finished
looking at the index, you visit all the required table tuples *in
physical order* using the bitmap.  This eliminates multiple fetches
of the same heap page, and can possibly let you get some win from
sequential access.

Once you have built this mechanism, you can then move on to using
multiple indexes in interesting ways: you can do several indexscans
in one query and then AND or OR their bitmaps before doing the heap
scan.  This would allow, for example, "WHERE a = foo and b = bar"
to be handled by ANDing results from separate indexes on the a and b
columns, rather than having to choose only one index to use as we do
now.

Some thoughts about implementation: FireBird's implementation seems
to depend on an assumption about a fixed number of tuple pointers
per page.  We don't have that, but we could probably get away with
just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
Also, the main downside of this approach is that the bitmap could
get large --- but you could have some logic that causes you to fall
back to plain sequential scan if you get too many index hits.  (It's
interesting to think of this as lossy compression of the bitmap...
which leads to the idea of only being fuzzy in limited areas of the
bitmap, rather than losing all the information you have.)

A possibly nasty issue is that lazy VACUUM has some assumptions in it
about indexscans holding pins on index pages --- that's what prevents
it from removing heap tuples that a concurrent indexscan is just about
to visit.  It might be that there is no problem: even if lazy VACUUM
removes a heap tuple and someone else then installs a new tuple in that
same TID slot, you should be okay because the new tuple is too new to
pass your visibility test.  But I'm not convinced this is safe.
        regards, tom lane

pgsql-hackers by date:

From: "Marc G. Fournier"
Date: 07 August 2002, 00:27:09
Subject: Re: Open 7.3 items

From: Tom Lane
Date: 07 August 2002, 00:43:51
Subject: Re: Open 7.3 items

Re: CLUSTER and indisclustered - Mailing list pgsql-hackers

Previous

Next