Home > mailing lists

Re: Index AM change proposals, redux - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Index AM change proposals, redux
Date	April 24, 2008 12:43:57
Msg-id	200804241543.m3OFhZZ04664@momjian.us Whole thread Raw
In response to	Re: Index AM change proposals, redux (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Index AM change proposals, redux Re: Index AM change proposals, redux
List	pgsql-hackers

Tree view

Simon Riggs wrote:
> > > Many users would be very interested if we could significantly reduce the
> > > size of the main index on their largest tables.
> > 
> > Yes, basically GIT allows index compression for clustered tables, and
> > stated that way it is clear it would be a big feature if we had it for
> > 8.4.
> 
> > > I would at least like to see clustered indexes acknowledged as a TODO
> > > item, so we keep the door open for a future implementation based around
> > > the basic concept of GIT.
> > 
> > Yes, it is already a TODO:
> > 
> > * Consider compressing indexes by storing key values duplicated in
> >   several rows as a single index entry
> 
> That sounds similar, but definitely does not describe what GIT does.
> 
> GIT holds one index pointer for a *range* of values held on one block.
> e.g. if we have a table with values 1-200 in it then GIT would store
> *one* index pointer for all 200 rows, even though the values are all
> different.
> 
> That property makes it very different from the TODO item stated, since
> GIT can work very well on Unique indexes, which clearly do not have
> duplicated keys.

Agreed, TODO updated:
* Consider smaller indexes that record a range of values per heap page,  rather than having one index entry for every
heaprow  This is useful if the heap is clustered by the indexed values.
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php
http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php
http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php

> I'd prefer the term "Clustered Indexes" rather than the name GIT, which
> is also a slang insult.

I try to avoid technical terms which might not be familiar, but instead
describe exactly what is requested.

> Index compression is possible in many ways, depending upon the
> situation. All of the following sound similar at a high level, but each
> covers a different use case.
> 
> * For Long, Similar data e.g. Text we can use Prefix Compression
> We still store one pointer per row, but we reduce the size of the index
> by reducing the size of the key values. This requires us to reach inside
> datatypes, so isn't a very general solution but is probably an important
> one in the future for Text. 
> 
> * For Unique/nearly-Unique indexes we can use Range Compression
> We reduce the size of the index by holding one index pointer per range
> of values, thus removing both keys and pointers. It's more efficient
> than prefix compression and isn't datatype-dependant. 
> 
> * For Highly Non-Unique Data we can use Duplicate Compression
> The latter is the technique used by Bitmap Indexes. Efficient, but not
> useful for unique/nearly-unique data
> 
> * Multi-Column Leading Value Compression - if you have a multi-column
> index, then leading columns are usually duplicated between rows inserted
> at the same time. Using an on-block dictionary we can remove duplicates.
> Only useful for multi-column indexes, possibly overlapping/contained
> subset of the GIT use case.

Should any of these others be TODOs?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: Bruce Momjian
Date: 24 April 2008, 12:38:54
Subject: Is this TODO item done?

From: Bruce Momjian
Date: 24 April 2008, 12:50:49
Subject: Re: Proposed patch - psql wraps at window width

Re: Index AM change proposals, redux - Mailing list pgsql-hackers

Previous

Next