Thread: Table clustering idea
There is a well known command called CLUSTER which organizes table<br />in specified index's order. It has a drawback, thatnew tuples added are<br />not in this order. Last night I had idea which could be interesting, I hope. <br /><br />Theidea is to make use of 'histogram_bounds' collected statistical data.<br />Instead of inserting row into first suitablespot in a table, a table would<br />be "divided" into sections, one for each of histogram_bounds ranges. <br />Wheninserting, the database would try to find most suitable section<br />to insert (using the histogram_bounds), and ifthere were free spots<br />there, would insert there. If not, it would either look for a tuple in nearby <br />sections,or first suitable place.<br /><br />What would it do? It would try to keep table somewhat organized,<br />keepingrows of similar values close together (within SET STATISTICS<br />resolution, so a common scenario would be 50 or100 "sections"). <br />It would make it a bit hard for a table to shrink (since new rows would<br />be added throughoutthe table, not at the beginning).<br /><br />Other idea than using histogram_bounds would be using the position<br/>of key inside the index to determine the "ideal" place of row inside <br />the table and find the closest freespot there. This would be of course<br />much more precise and wouldn't rely on statistic.<br /><br /> Regards,<br/> Dawid<br />
Dawid, > Other idea than using histogram_bounds would be using the > position of key inside the index to determine the "ideal" > place of row inside the table and find the closest free spot > there. This would be of course much more precise and wouldn't > rely on statistic. This is generally known as "index organized tables" in other databases. It implements a clustered storage of the table, which dramatically speeds access on the chosen indexed column and makes inserts fast. This also eases some of the visibility/MVCC implementation issues being discussed on hackers, but does not help with the maintenance of non-storage key indices. Other DBMS have index organized tables that can use either hash or btree organizations, both of which have their uses. We are planning to implement btree organized tables sometime - anyone else interested in this idea? - Luke
Luke, > Other DBMS have index organized tables that can use either hash or btree > organizations, both of which have their uses. We are planning to > implement btree organized tables sometime - anyone else interested in > this idea? Search the archives. It's been dicussed on this list at least twice before, that I know of. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote: > Other DBMS have index organized tables that can use either hash or btree > organizations, both of which have their uses. We are planning to > implement btree organized tables sometime - anyone else interested in > this idea? I'm curious how you'll do it, as I was once told that actually trying to store heap data in a btree structure would be a non-starter (don't remember why). On a somewhat related note, I think that it would be advantageous if the FSM had a means to prefer certain pages for a given tuple over other pages. This would allow for a better way to keep heap and possibly index data more compacted, and it would also be a means of keeping tables loosely clustered. It would also make it far easier to shrink heaps that have become bloated, because the FSM could be told to favor pages at the beginning of the relation. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim, On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > On a somewhat related note, I think that it would be advantageous if the > FSM had a means to prefer certain pages for a given tuple over other > pages. This would allow for a better way to keep heap and possibly index > data more compacted, and it would also be a means of keeping tables > loosely clustered. It would also make it far easier to shrink heaps that > have become bloated, because the FSM could be told to favor pages at the > beginning of the relation. Interesting idea - page affinity implemented using the FSM. WRT feasibility of BTREE organized tables, I'm not sure I see the problem. Teradata implemented a hashing filesystem for their heap storage and I've always wondered about how they handle collision and chaining efficiently, but it's a solved problem for sure - knowing that makes the challenge that much easier :-) - Luke
Jim C. Nasby wrote: > On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote: > >> Other DBMS have index organized tables that can use either hash or btree >> organizations, both of which have their uses. We are planning to >> implement btree organized tables sometime - anyone else interested in >> this idea? >> > > I'm curious how you'll do it, as I was once told that actually trying to > store heap data in a btree structure would be a non-starter (don't > remember why). > Ingres is now open source - they have clustering on btree/isam/hash (it's called "modify table xx to btree on col1,col2") Regards, Kim
On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote: > Jim, > > On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > > > On a somewhat related note, I think that it would be advantageous if the > > FSM had a means to prefer certain pages for a given tuple over other > > pages. This would allow for a better way to keep heap and possibly index > > data more compacted, and it would also be a means of keeping tables > > loosely clustered. It would also make it far easier to shrink heaps that > > have become bloated, because the FSM could be told to favor pages at the > > beginning of the relation. > > Interesting idea - page affinity implemented using the FSM. > > WRT feasibility of BTREE organized tables, I'm not sure I see the problem. > Teradata implemented a hashing filesystem for their heap storage and I've > always wondered about how they handle collision and chaining efficiently, > but it's a solved problem for sure - knowing that makes the challenge that > much easier :-) I know there were discussions in the past, though as per usual I can't find them in the archives. At one point I had suggested clustering not on a row level, but on a page level, since it doesn't really matter terribly if the tuples in a page are clustered (worst case you can scan the entire page). I think one of the issues might have been: how will you handle other indexes on the table when you can no longer point them at an item (since items will need to move to maintain an IOT). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> I think one of the issues might have been: how will you handle other > indexes on the table when you can no longer point them at an item (since > items will need to move to maintain an IOT). I guess you shouldn't allow any other indexes. That's a perfectly acceptable compromise I think... it would be still very useful for big and narrow tables which would benefit from being clustered. The other concern is how would you do sequential scans on the table if items are allowed to move ? I think some other DBs have a facility to make a "fast index scan" which is essentially a sequential scan of the index file, something like that would be needed here too. Cheers, Csaba.
On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote: > I think one of the issues might have been: how will you handle other > indexes on the table when you can no longer point them at an item > (since > items will need to move to maintain an IOT). There are clean ways to handle this. The table is organized on the primary key, a typical requirement for IOTs. Any indexes you add to IOT reference the primary key of the heap tuple. Since the heap and PK index are the same thing, external indexes use the PK as the tuple identifier. The only caveat is that this creates performance asymmetries. IOTs have significantly faster access through their primary keys but slower external index access since two B-Trees have to be traversed. An IOT is typically only used for tables that are only accessed through their primary key. Not supporting external indexes on IOTs is a functional implementation (and probably recommended in practice), though most real implementations allow external indexes if not always in their first version. J. Andrew Rogers jrogers@neopolitan.com
Jim, > I know there were discussions in the past, though as per usual I can't > find them in the archives. Search on "B-Tree Organized Tables". From what I can find, this feature isn't prohibitively useless. It's just a singnificant amount of effort for a result which is a tradeoff. That is, you'd *only* want to use it on tables which are *always* accessed by their primary key. What stopped the features AFAICT is that the interested parties weren't up to doing the code. -- Josh Berkus PostgreSQL @ Sun San Francisco