Thread: Need theory/comprehension help on Multi-Column indexes
Folks, I've been poking around the indexing code, and I really don't understand the page structure and splittng/branching for multi-column BTree indexes. I've looked in a couple DB textbooks to get a theoretically underpinning of the structure of multi-column indexes, but none of the ones I've seen cover them. Can someone help me out? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> Folks, > > I've been poking around the indexing code, and I really don't understand > the > page structure and splittng/branching for multi-column BTree indexes. > I've > looked in a couple DB textbooks to get a theoretically underpinning of the > structure of multi-column indexes, but none of the ones I've seen cover > them. > Can someone help me out? Heh. You haven't done much programming in COBOL. The basic idea is to combine the multiple fields in a sequence of bytes (reversible into the original fields) and do a straight strcmp() int c(6) n(2) So you have key k on t(f1, f2, f3) And do an insert to t(1, 'abc', 44) The datum******* ** "00000001 abc44" gets applied to the index. The values below the stars are the lowest values supported by that particular type. The requirement being that for a type to be indexible it must have able to be mutated into a fixed length string. At least, that is the simple way to do it. It is also possible to create an index using discreet fields and the type's built in Boolean comparison. This is more complicated, for example to find out if t(a,b,c) > t(a1,b1,c1) You have to check a >= a1 and (a > a1 or b >= b1) and (a > a1 or b > b1 or c > c1) or the Boolean reverse of the above: a > a1 or (a >= a1 and b > b1) or (a >= a1 or b >= b1 or c > c1) The above expression would have to be applied to generate a comparison between an input value and a stored key value. Merlin
Josh Berkus <josh@agliodbs.com> writes: > I've been poking around the indexing code, and I really don't understand the > page structure and splittng/branching for multi-column BTree indexes. It's not fundamentally different from single-column indexes. The only aspect of a btree index that requires any knowledge about the content of index entries is the "compare two index entries for lesser, equal, or greater" operation. For that, we just compare the first columns, then compare the second columns if the first are equal, etc. Plain lexicographic sort semantics. Everything else in the btree code just considers an index entry to be an undifferentiated tuple. regards, tom lane
Tom, Merlin, > It's not fundamentally different from single-column indexes. The only > aspect of a btree index that requires any knowledge about the content of > index entries is the "compare two index entries for lesser, equal, or > greater" operation. For that, we just compare the first columns, then > compare the second columns if the first are equal, etc. Plain > lexicographic sort semantics. So the different columns of the index don't have seperate data pages? It's just a partitioned index node? Wow, no wonder I couldn't figure it out, I was looking for something more complicated ... BTW, while we're on the optimizer, what is random_page_cost supposed to represent, exactly? I used to think it was the ratio of index page retreivals to direct page retrievals, but I see that that's already being calculated for. I'm wondering if it might be possible to calculate RPC and eliminate it as a GUC. -- --Josh Josh Berkus Aglio Database Solutions San Francisco