Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 200606242336.k5ONa5b04286@momjian.us Whole thread Raw |
In response to | Re: vacuum, performance, and MVCC (Hannu Krosing <hannu@skype.net>) |
Responses |
Re: vacuum, performance, and MVCC
|
List | pgsql-hackers |
Hannu Krosing wrote: > ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane: > > Csaba Nagy <nagy@ecircle-ag.com> writes: > > >> Surprisingly its mostly WAL traffic, the heap/index pages themselves are > > >> often not yet synced to disk by time of vacuum, so no additional traffic > > >> there. If you had made 5 updates per page and then vacuum it, then you > > >> make effectively 1 extra WAL write meaning 20% increase in WAL traffic. > > > > > Is this also holding about read traffic ? I thought vacuum will make a > > > full table scan... for big tables a full table scan is always badly > > > influencing the performance of the box. If the full table scan would be > > > avoided, then I wouldn't mind running vacuum in a loop... > > > > If you're doing heavy updates of a big table then it's likely to end up > > visiting most of the table anyway, no? There is talk of keeping a map > > of dirty pages, but I think it'd be a win for infrequently-updated > > tables, not ones that need constant vacuuming. > > > > I think a lot of our problems in this area could be solved with fairly > > straightforward tuning efforts on the existing autovacuum > > infrastructure. In particular, someone should be looking into > > recommendable default vacuum-cost-delay settings so that a background > > vacuum doesn't affect performance too much. > > One thing that would help updates quite a lot in some scenarios is > keeping the pages only partially-filled, so that most updates could keep > the new version in the same page. I think that has also been discussed > as an option to vacuum and maybe as part of initial inserts. Maybe some > of it even ended up as a todo item. We have a patch in the queue for index fillfactor which will be in 8.2. I am also hoping the frequently updated rows will migrate out to the empty pages. > > Another problem with the > > current autovac infrastructure is that it doesn't respond very well to > > the case where there are individual tables that need constant attention > > as well as many that don't. If you have N databases then you can visit > > a particular table at most once every N*autovacuum_naptime seconds, and > > *every* table in the entire cluster gets reconsidered at that same rate. > > I'm not sure if we need the ability to have multiple autovac daemons > > running at the same time, > > My patch enabling effective continuous vacuum of fast-update tables, > while still being able to vacuum huge slowly changing ones is still not > applied. Without that patch there is no reason to vacuum the small and > fast changingg tables while vacuum on bigger tables is running, as it > won't clean out dead tuples anyway. I think it will be applied, but I am looking for someone else to eyeball it since Tom has come concerns. > > but we definitely could use something with a > > more flexible table-visiting pattern. Perhaps it would be enough to > > look through the per-table stats for each database before selecting the > > database to autovacuum in each cycle, instead of going by "least > > recently autovacuumed". > > > > Bottom line: there's still lots of low-hanging fruit. Why are people > > feeling that we need to abandon or massively complicate our basic > > architecture to make progress? > > Maybe we could start from reusing the index tuples which point to > invisible tuples ? The index is not MVCC anyway, so maybe it is easier > to do in-place replacement there ? > > This probably has the same obstacles which have prevented us from > removing those in the first place (removing instead of marking as > invisible). Does it cause some locking issues ? Or does it go against > some other constraints of our index lookups ? > > I think that just setting the invisible bit in an index leaf node causes > nearly as much disk io as removing the node. > > If we could delete/reuse old index tuples, it would solve a sizable > chunk of index-growth problem, especially for cases where referenced key > value does not change. I think heap _and_ index reuse is the only useful direction. Index or heap reuse alone seems too marginal for the added complexity. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: