Re: Bitmap index thoughts - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | Re: Bitmap index thoughts |
Date | |
Msg-id | Pine.LNX.4.58.0612271300520.21742@linuxworld.com.au Whole thread Raw |
In response to | Bitmap index thoughts (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Bitmap index thoughts
Re: Bitmap index thoughts |
List | pgsql-hackers |
Hey Heikki, On Tue, 26 Dec 2006, Heikki Linnakangas wrote: > I've been skimming through the bitmap index patch... > > A scan needs to access at least five pages: > > 1. B-tree index (root+others, depending on depth) > 2. The auxiliary heap page > 3. bitmap index meta page > 4. LOV page > 5. bitmap page > > That seems like a lot of indirection. A high startup cost is probably ok You're right, this is excessive and it was something I'd hoped to have ripped out, but... > for typical bitmap index use cases and most of the needed pages should > stay in memory, but could we simplify this? Why do we need the auxiliary > heap, couldn't we just store the blk+offset of the LOV item directly in > the b-tree index item? The problem is, the b-tree code is very much tied to the heap. I don't want to modify the b-tree code to make bitmap indexes work (better). What's really tempting is to just manage our own balanced tree within the bitmap index file(s) itself. It would start from the metapage and simply spill to other 'special' index pages when necessary. The problem is, we do not have b-tree code generic enough that it would allow us to do this trivially -- consider concurrency and WAL in particular, which we currently get for free. I guess this is why I've been ignoring this issue :-). > And instead of having separate LOV pages that store a number of LOV > items, how about storing each LOV item on a page of it's own, and using > the rest of the page to store the last chunk of the bitmap. That would > eliminate one page access, but more importantly, maybe we could then get > rid of all the bm_last_* attributes in BMLOVItemData that complicate the > patch quite a bit, while preserving the performance. That's an interesting approach. We would still need a concept of last_word, at the very least, and probably last_comp_word for convenience. Your idea doesn't require any extra space, either, which is good. Something I've been working through is the idea of a 'bitmap data segment'. Currently, we store the HRL compressed bitmap data to the extent of the page. That is, sizeof(BMBitmap) is as close to BLCKSZ as we can make it. The problem is that we may have some bitmaps where a few values occur only a small number of times and are well clustered at the beginning of the heap. In that circumstance, we use a whole page to store a small number of words and the free space cannot be used by any other vector. Now, say a segment was some fraction the size of BLCKSZ, we use less space for those vectors with few tuples in the heap. Just an idea... Thanks, Gavin PS: Another versio of the patch shall be forthcoming shortly. I've been working on compressing the data in memory during CREATE INDEX instead of just managing arrays of TIDs in memory as we did previously. The array of TIDs works great for well clustered data but it stinks for poorly clustered data as we approach maintenance_word_mem and have to swap a lot.
pgsql-hackers by date: