Re: CLUSTER and indisclustered - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: CLUSTER and indisclustered |
Date | |
Msg-id | 1028733234.13418.113.camel@taru.tm.ee Whole thread Raw |
In response to | Re: CLUSTER and indisclustered (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: CLUSTER and indisclustered
|
List | pgsql-hackers |
On Wed, 2002-08-07 at 15:26, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > Now I remembered my original preference for page bitmaps (vs. tuple > > bitmaps): one can't actually make good use of a bitmap of tuples because > > there is no fixed tuples/page ratio and thus no way to quickly go from > > bit position to actual tuple. You mention the same problem but propose a > > different solution. > > > Using page bitmap, we will at least avoid fetching any unneeded pages - > > essentially we will have a sequential scan over possibly interesting > > pages. > > Right. One form of the "lossy compression" idea I suggested is to > switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets > too large to work with. If it is a real bitmap, should it not be easyeast to allocate at the start ? a page bitmap for a 100 000 000 tuple table with 10 tuples/page will be sized 10000000/8 = 1.25 MB, which does not look too big for me for that amount of data (the data table itself would occupy 80 GB). Even having the bitmap of 16 bits/page (with the bits 0-14 meaning tuples 0-14 and bit 15 meaning "seq scan the rest of page") would consume just 20 MB of _local_ memory, and would be quite justifyiable for a query on a table that large. For a real bitmap index the tuples-per-page should be a user-supplied tuning parameter. > Again, one could imagine doing that only in denser areas of the bitmap. I would hardly call the resulting structure "a bitmap" ;) And I'm not sure the overhead for a more complex structure would win us any additional performance for most cases. > > But I guess that CLUSTER support for INSERT will not be touched for 7.3 > > as will real bitmap indexes ;) > > All of this is far-future work I think. After we do that we will probably be able claim support for "datawarehousing" ;) > Adding a new scan type to the > executor would probably be pretty localized, but the ramifications in > the planner could be extensive --- especially if you want to do plans > involving ANDed or ORed bitmaps. Also going to "smart inserter" which can do local clustering on sets of real bitmap indexes for INSERTS (and INSERT side of UPDATE) would probably be a major change from our current "stupid inserter" ;) This will not be needed for bitmap resolution higher than 1bit/page but default local clustering on bitmap indexes will probably buy us some extra performance. by avoiding data page fetches when such indexes are used. AN anyway the support for INSERT being aware of clustering will probably come up sometime. ------------ Hannu
pgsql-hackers by date: