Re: Rewriting Free Space Map - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Rewriting Free Space Map |
Date | |
Msg-id | 47DE76E9.8060009@enterprisedb.com Whole thread Raw |
In response to | Re: Rewriting Free Space Map (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Rewriting Free Space Map
|
List | pgsql-hackers |
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> I've started working on revamping Free Space Map, using the approach >> where we store a map of heap pages on every nth heap page. What we need >> now is discussion on the details of how exactly it should work. > > You're cavalierly waving away a whole boatload of problems that will > arise as soon as you start trying to make the index AMs play along > with this :-(. It doesn't seem very hard. An indexam wanting to use FSM needs a little bit of code where the relation is extended, to let the FSM initialize FSM pages. And then there's the B-tree metapage issue I mentioned. But that's all, AFAICS. > Hash for instance has very narrow-minded ideas about > page allocation within its indexes. Hash doesn't use FSM at all. > Also, I don't think that "use the special space" will scale to handle > other kinds of maps such as the proposed dead space map. (This is > exactly why I said the other day that we need a design roadmap for all > these ideas.) It works for anything that scales linearly with the relation itself. The proposed FSM and visibility map both fall into that category. A separate file is certainly more flexible. I was leaning towards that option originally (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00142.php) for that reason. > The idea that's becoming attractive to me while contemplating the > multiple-maps problem is that we should adopt something similar to > the old Mac OS idea of multiple "forks" in a relation. In addition > to the main data fork which contains the same info as now, there could > be one or more map forks which are separate files in the filesystem. > They are named by relfilenode plus an extension, for instance a relation > with relfilenode NNN would have a data fork in file NNN (plus perhaps > NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus > NNN.map.1 etc as needed). We'd have to add one more field to buffer > lookup keys (BufferTag) to disambiguate which fork the referenced page > is in. Having bitten that bullet, though, the idea trivially scales to > any number of map forks with potentially different space requirements > and different locking and WAL-logging requirements. Hmm. You also need to teach at least xlog.c and xlogutils.c about the map forks, for full page images and the invalid page tracking. I also wonder what the performance impact of extending BufferTag is. My original thought was to have a separate RelFileNode for each of the maps. That would require no smgr or xlog changes, and not very many changes in the buffer manager, though I guess you'd more catalog changes. You had doubts about that on the previous thread (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but the "map forks" idea certainly seems much more invasive than that. I like the "map forks" idea; it groups the maps nicely at the filesystem level, and I can see it being useful for all kinds of things in the future. The question is, is it really worth the extra code churn? If you think it is, I can try that approach. > Another possible advantage is that a new map fork could be added to an > existing table without much trouble. Which is certainly something we'd > need if we ever hope to get update-in-place working. Yep. > The main disadvantage I can see is that for very small tables, the > percentage overhead from multiple map forks of one page apiece is > annoyingly high. However, most of the point of a map disappears if > the table is small, so we might finesse that by not creating any maps > until the table has reached some minimum size. Yeah, the map fork idea is actually better than the "every nth heap page" approach from that point of view. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: