Thread: Need theory/comprehension help on Multi-Column indexes

Need theory/comprehension help on Multi-Column indexes

From

Josh Berkus

Date:

04 January 2005, 19:18:16

Folks,

I've been poking around the indexing code, and I really don't understand the 
page structure and splittng/branching for multi-column BTree indexes.    I've 
looked in a couple DB textbooks to get a theoretically underpinning of the 
structure of multi-column indexes, but none of the ones I've seen cover them.  
Can someone help me out?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Need theory/comprehension help on Multi-Column indexes

From

"Merlin Moncure"

Date:

04 January 2005, 20:10:54

> Folks,
>
> I've been poking around the indexing code, and I really don't
understand
> the
> page structure and splittng/branching for multi-column BTree indexes.
> I've
> looked in a couple DB textbooks to get a theoretically underpinning of
the
> structure of multi-column indexes, but none of the ones I've seen
cover
> them.
> Can someone help me out?

Heh. You haven't done much programming in COBOL. The basic idea is to
combine the multiple fields in a sequence of bytes (reversible into the
original fields) and do a straight strcmp()                    int  c(6) n(2)
So you have key k on t(f1, f2, f3)
And do an insert to t(1, 'abc', 44)
The datum******* **
"00000001  abc44" gets applied to the index.  The values below the stars
are the lowest values supported by that particular type.  The
requirement being that for a type to be indexible it must have able to
be mutated into a fixed length string.

At least, that is the simple way to do it.  It is also possible to
create an index using discreet fields and the type's built in Boolean
comparison.  This is more complicated, for example to find out if
t(a,b,c) > t(a1,b1,c1)
You have to check
a >= a1 and
(a > a1 or b >= b1) and
(a > a1 or b > b1 or c > c1)
or the Boolean reverse of the above:
a > a1 or
(a >= a1 and b > b1) or
(a >= a1 or b >= b1 or c > c1)

The above expression would have to be applied to generate a comparison
between an input value and a stored key value.
Merlin

Re: Need theory/comprehension help on Multi-Column indexes

From

Tom Lane

Date:

04 January 2005, 21:33:29

Josh Berkus <josh@agliodbs.com> writes:
> I've been poking around the indexing code, and I really don't understand the 
> page structure and splittng/branching for multi-column BTree indexes.

It's not fundamentally different from single-column indexes.  The only
aspect of a btree index that requires any knowledge about the content of
index entries is the "compare two index entries for lesser, equal, or
greater" operation.  For that, we just compare the first columns, then
compare the second columns if the first are equal, etc.  Plain
lexicographic sort semantics.

Everything else in the btree code just considers an index entry to be an
undifferentiated tuple.
        regards, tom lane

Re: Need theory/comprehension help on Multi-Column indexes

From

Josh Berkus

Date:

04 January 2005, 21:41:25

Tom, Merlin,

> It's not fundamentally different from single-column indexes.  The only
> aspect of a btree index that requires any knowledge about the content of
> index entries is the "compare two index entries for lesser, equal, or
> greater" operation.  For that, we just compare the first columns, then
> compare the second columns if the first are equal, etc.  Plain
> lexicographic sort semantics.

So the different columns of the index don't have seperate data pages?   It's 
just a partitioned index node?

Wow, no wonder I couldn't figure it out, I was looking for something more 
complicated ...

BTW, while we're on the optimizer, what is random_page_cost supposed to 
represent, exactly?    I used to think it was the ratio of index page 
retreivals to direct page retrievals, but I see that that's already being 
calculated for.    I'm wondering if it might be possible to calculate RPC and 
eliminate it as a GUC.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco