Re: Minmax indexes - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | 20140615023404.GY18688@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Minmax indexes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Minmax indexes
Re: Minmax indexes |
List | pgsql-hackers |
Robert Haas wrote: > On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Here's an updated version of this patch, with fixes to all the bugs > > reported so far. Thanks to Thom Brown, Jaime Casanova, Erik Rijkers and > > Amit Kapila for the reports. > > I'm not very happy with the use of a separate relation fork for > storing this data. Here's a new version of this patch. Now the revmap is not stored in a separate fork, but together with all the regular data, as explained elsewhere in the thread. I added a few pageinspect functions that let one explore the data in the index. With this you can start by reading the metapage, and from there obtain the block numbers for the revmap array pages; and explore revmap array pages to read regular revmap pages, which contain the TIDs to index entries. All these pageinspect functions don't currently have any documentation, but it's as easy as with idxname as (select 'ti'::text as idxname) select * from idxname, generate_series(0, pg_relation_size(idxname) / 8192 - 1) i, minmax_page_type(get_raw_page(idxname, i::int)); select * -- data in metapage from minmax_metapage_info(get_raw_page('ti', 0)); select * -- data in revmap array pages from minmax_revmap_array_data(get_raw_page('ti', 6)); select logblk, unnest(pages) -- data in regular revmap pages from minmax_revmap_data(get_raw_page('ti', 15)); select * -- data in regular index pages from minmax_page_items(get_raw_page('ti', 2), 'ti'::regclass); Note that in this last case you need to give it the OID of the index as the second parameter, so that it can construct a tupledesc for decoding the min/max data. I have followed the suggestion by Amit to overwrite the index tuple when a new heap tuple is inserted, instead of creating a separate index tuple. This saves a lot of index bloat. This required a new entry point in bufpage.c, PageOverwriteItemData(). bufpage.c also has a new function PageIndexDeleteNoCompact which is similar in spirit to PageIndexMultiDelete except that item pointers do not change. This is necessary because the revmap stores item pointers, and such reference would break if we were to renumber items in index pages. I have also added a reloption for the size of each page range, so you can do create index ti on t using minmax (a) with (pages_per_range = 2); The default is 128 pages per range, and I have an arbitrary maximum of 131072 (default size of a 1GB segment). There doesn't seem to be much point in having larger page ranges; intuitively I think page ranges should be more or less the size of kernel readahead, but I haven't tested this. I didn't want to rebase past 0ef0b6784 in a hurry. I only know this applies cleanly on top of fe7337f2dc, so please use that if you want to play with it. I will post a rebased version shortly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: