Home > mailing lists

Re: Rewriting Free Space Map - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Rewriting Free Space Map
Date	March 17, 2008 10:52:31
Msg-id	47DE76E9.8060009@enterprisedb.com Whole thread Raw
In response to	Re: Rewriting Free Space Map (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Rewriting Free Space Map
List	pgsql-hackers

Tree view

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> I've started working on revamping Free Space Map, using the approach 
>> where we store a map of heap pages on every nth heap page. What we need 
>> now is discussion on the details of how exactly it should work.
> 
> You're cavalierly waving away a whole boatload of problems that will
> arise as soon as you start trying to make the index AMs play along
> with this :-(.  

It doesn't seem very hard. An indexam wanting to use FSM needs a little 
bit of code where the relation is extended, to let the FSM initialize 
FSM pages. And then there's the B-tree metapage issue I mentioned. But 
that's all, AFAICS.

> Hash for instance has very narrow-minded ideas about
> page allocation within its indexes.

Hash doesn't use FSM at all.

> Also, I don't think that "use the special space" will scale to handle
> other kinds of maps such as the proposed dead space map.  (This is
> exactly why I said the other day that we need a design roadmap for all
> these ideas.)

It works for anything that scales linearly with the relation itself. The 
proposed FSM and visibility map both fall into that category.

A separate file is certainly more flexible. I was leaning towards that 
option originally 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00142.php) for 
that reason.

> The idea that's becoming attractive to me while contemplating the
> multiple-maps problem is that we should adopt something similar to
> the old Mac OS idea of multiple "forks" in a relation.  In addition
> to the main data fork which contains the same info as now, there could
> be one or more map forks which are separate files in the filesystem.
> They are named by relfilenode plus an extension, for instance a relation
> with relfilenode NNN would have a data fork in file NNN (plus perhaps
> NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
> NNN.map.1 etc as needed).  We'd have to add one more field to buffer
> lookup keys (BufferTag) to disambiguate which fork the referenced page
> is in.  Having bitten that bullet, though, the idea trivially scales to
> any number of map forks with potentially different space requirements
> and different locking and WAL-logging requirements.

Hmm. You also need to teach at least xlog.c and xlogutils.c about the 
map forks, for full page images and the invalid page tracking. I also 
wonder what the performance impact of extending BufferTag is.

My original thought was to have a separate RelFileNode for each of the 
maps. That would require no smgr or xlog changes, and not very many 
changes in the buffer manager, though I guess you'd more catalog 
changes. You had doubts about that on the previous thread 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
the "map forks" idea certainly seems much more invasive than that.

I like the "map forks" idea; it groups the maps nicely at the filesystem 
level, and I can see it being useful for all kinds of things in the 
future. The question is, is it really worth the extra code churn? If you 
think it is, I can try that approach.

> Another possible advantage is that a new map fork could be added to an
> existing table without much trouble.  Which is certainly something we'd
> need if we ever hope to get update-in-place working.

Yep.

> The main disadvantage I can see is that for very small tables, the
> percentage overhead from multiple map forks of one page apiece is
> annoyingly high.  However, most of the point of a map disappears if
> the table is small, so we might finesse that by not creating any maps
> until the table has reached some minimum size.

Yeah, the map fork idea is actually better than the "every nth heap 
page" approach from that point of view.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Alvaro Herrera
Date: 17 March 2008, 10:42:45
Subject: Re: [0/4] Proposal of SE-PostgreSQL patches

From: KaiGai Kohei
Date: 17 March 2008, 10:56:32
Subject: Re: [0/4] Proposal of SE-PostgreSQL patches

Re: Rewriting Free Space Map - Mailing list pgsql-hackers

Previous

Next