Re: Dead Space Map version 2 - Mailing list pgsql-hackers
From | Jim C. Nasby |
---|---|
Subject | Re: Dead Space Map version 2 |
Date | |
Msg-id | 20070227051144.GK29041@nasby.net Whole thread Raw |
In response to | Dead Space Map version 2 (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Responses |
Re: Dead Space Map version 2
Re: Dead Space Map version 2 |
List | pgsql-hackers |
On Tue, Feb 27, 2007 at 12:05:57PM +0900, ITAGAKI Takahiro wrote: > Each heap pages have 4 states for dead space map; HIGH, LOW, UNFROZEN and > FROZEN. VACUUM uses the states to reduce the number of target pages. > > - HIGH : High priority to vacuum. Maybe many dead tuples in the page. > - LOW : Low priority to vacuum Maybe few dead tuples in the page. > - UNFROZEN : No dead tuples, but some unfrozen tuples in the page. > - FROZEN : No dead nor unfrozen tuples in the page. > > If we do UPDATE a tuple, the original page containing the tuple is marked > as HIGH and the new page where the updated tuple is placed is marked as LOW. Don't you mean UNFROZEN? > When we commit the transaction, the updated tuples needs only FREEZE. > That's why the after-page is marked as LOW. However, If we rollback, the > after-page should be vacuumed, so we should mark the page LOW, not UNFROZEN. > We don't know the transaction will commit or rollback at the UPDATE. What makes it more important to mark the original page as HIGH instead of LOW, like the page with the new tuple? The description of the states indicates that there would likely be a lot more dead tuples in a HIGH page than in a LOW page. Perhaps it would be better to have the bgwriter take a look at how many dead tuples (or how much space the dead tuples account for) when it writes a page out and adjust the DSM at that time. > * Agressive freezing > We will freeze tuples in dirty pages using OldestXmin but FreezeLimit. > This is for making FROZEN pages but not UNFROZEN pages as far as possible > in order to reduce works in XID wraparound vacuums. Do you mean using OldestXmin instead of FreezeLimit? Perhaps it might be better to save that optimization for later... > In current implementation, DSM allocates a bunch of memory at start up and > we cannot modify it in running. It's maybe enough because DSM consumes very > little memory -- 32MB memory per 1TB database. > > There are 3 parameters for FSM and DSM. > > - max_fsm_pages = 204800 > - max_fsm_relations = 1000 (= max_dsm_relations) > - max_dsm_pages = 4096000 > > I'm thinking to change them into 2 new paramaters. We will allocates memory > for DSM that can hold all of estimated_database_size, and for FSM 50% or > something of the size. Is this reasonable? I don't think so, at least not until we get data from the field about what's typical. If the DSM is tracking every page in the cluster then I'd expect the FSM to be closer to 10% or 20% of that, anyway. > I've already have a recovery extension. However, it can recover DSM > but not FSM. Do we also need to restore FSM? If we don't, unreusable > pages might be left in heaps. Of cource it could be reused if another > tuple in the page are updated, but VACUUM will not find those pages. Yes, DSM would make FSM recovery more important, but I thought it was recoverable now? Or is that only on a clean shutdown? I suspect we don't need perfect recoverability... theoretically we could just commit the FSM after vacuum frees pages and leave it at that; if we revert to that after a crash, backends will grab pages from the FSM only to find there's no more free space, at which point they could pull the page from the FSM and find another one. This would lead to degraded performance for a while after a crash, but that might be a good trade-off. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
pgsql-hackers by date: