Re: Index AM change proposals, redux - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Index AM change proposals, redux |
Date | |
Msg-id | 200804241543.m3OFhZZ04664@momjian.us Whole thread Raw |
In response to | Re: Index AM change proposals, redux (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Index AM change proposals, redux
Re: Index AM change proposals, redux |
List | pgsql-hackers |
Simon Riggs wrote: > > > Many users would be very interested if we could significantly reduce the > > > size of the main index on their largest tables. > > > > Yes, basically GIT allows index compression for clustered tables, and > > stated that way it is clear it would be a big feature if we had it for > > 8.4. > > > > I would at least like to see clustered indexes acknowledged as a TODO > > > item, so we keep the door open for a future implementation based around > > > the basic concept of GIT. > > > > Yes, it is already a TODO: > > > > * Consider compressing indexes by storing key values duplicated in > > several rows as a single index entry > > That sounds similar, but definitely does not describe what GIT does. > > GIT holds one index pointer for a *range* of values held on one block. > e.g. if we have a table with values 1-200 in it then GIT would store > *one* index pointer for all 200 rows, even though the values are all > different. > > That property makes it very different from the TODO item stated, since > GIT can work very well on Unique indexes, which clearly do not have > duplicated keys. Agreed, TODO updated: * Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heaprow This is useful if the heap is clustered by the indexed values. http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php > I'd prefer the term "Clustered Indexes" rather than the name GIT, which > is also a slang insult. I try to avoid technical terms which might not be familiar, but instead describe exactly what is requested. > Index compression is possible in many ways, depending upon the > situation. All of the following sound similar at a high level, but each > covers a different use case. > > * For Long, Similar data e.g. Text we can use Prefix Compression > We still store one pointer per row, but we reduce the size of the index > by reducing the size of the key values. This requires us to reach inside > datatypes, so isn't a very general solution but is probably an important > one in the future for Text. > > * For Unique/nearly-Unique indexes we can use Range Compression > We reduce the size of the index by holding one index pointer per range > of values, thus removing both keys and pointers. It's more efficient > than prefix compression and isn't datatype-dependant. > > * For Highly Non-Unique Data we can use Duplicate Compression > The latter is the technique used by Bitmap Indexes. Efficient, but not > useful for unique/nearly-unique data > > * Multi-Column Leading Value Compression - if you have a multi-column > index, then leading columns are usually duplicated between rows inserted > at the same time. Using an on-block dictionary we can remove duplicates. > Only useful for multi-column indexes, possibly overlapping/contained > subset of the GIT use case. Should any of these others be TODOs? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: