bitmap AM design - Mailing list pgsql-hackers
From | Victor Y. Yegorov |
---|---|
Subject | bitmap AM design |
Date | |
Msg-id | 20050228194948.GB5772@mits.lv Whole thread Raw |
Responses |
Re: bitmap AM design
Re: bitmap AM design |
List | pgsql-hackers |
Here's the design of bitmap AM I'm planning to implement. I've discussed it with Neil, but I'm willing to get more feedback on it. There are going to be 1 metapage, 1 list of CTIDs (LOC), one list of attribute values (LOV, including attributes for multi-column indexes) and a bitmap for each entry in the LOV. Metapage will contain pointers to the LOC, LOV will always start at page-1 and is organized as a 1-way chained list. Neil suggested a very good way how to handle updates. Of course, it's not necessary to strictly follow tuple layout in the table's heap, as I wanted to do initially. All that's needed, is that bit positions in bitmaps would be tied with CTID positions in LOC. So, we can do simple insert (generally, that's how MVCC works for tuple updates): when some tuple is updated, new CTID is added to the LOC and each bitmap is extended by 1 bit. On the other side (as Neil pointed), after VACUUM, we need to do some maintenance of bitmap index to avoid situations when index contains duplicate entries (in LOC and bitmaps) for the same CTID (value before marking tuple for reuse and after actually reusing it). So far, the fastest way would be recreating index, because the whole index designed for faster search/insert operations and lacks effective reverse mapping (CTID->bit position->bit value) functionality. List of CTIDs is organized into an array of extents. Each extent has 2**i pages ('i' is extent number in array). All pages for extent are allocated at once, ensuring their IDs are sequential. So, we need to store only BufferNumber of the first page in extent. LOC pages contain array of ItemPointerData, it's size is detected at compile time. So, CTID for given bit position can be obtained via only one ReadBuffer() call. LOV pages store arrays of value-descriptors, each descriptor has a pointer to the head of value's bitmap. Bitmaps are organized as 1-way chained list, bitmap contents is compressed using Word-Aligned Hybryd method (similar to RLE). Neil suggested the other way of organizing bitmap storage: all bits for given position are stored in one page (if it lacks space, new page is added "at the bootom"), so we'll have a 2-dimensional bitmap storage. This reduces amount of page reads during index scan, but for low-cardinality indexes, we'll waste a lot of space per page, if each CTIDs slice is stored in one page. On the other hand, it'll be hard to extend bitmap if we'll store several CTID slices per page and some new value will be inserted (i.e. CTID slice needs to be increased). At the moment, I would implement the easiest method -- equality encoding (as it's called in papers, bitmap per value). Neil's suggested techniques are called range encoding. Josh Berkus on the irc suggested implementing several encoding schemes as an option to the "create index" sql command. What do you think? I haven't looked at planner/executor yet. If you have some questions, please, ask. Also, I'd like to tell that this is my first time coding for PostgreSQL, thus I may use incorrect terminology. Also, it takes much time to write anything, for the same reason. And yes, I would really appreciate all kind of criticism on this design. I've started to implement AM, I need to register am* functions, what OIDs should use to register them in include/catalog/pg_proc.h? Waiting for your feedback. -- Victor Y. Yegorov
pgsql-hackers by date: