Re: autovacuum not prioritising for-wraparound tables - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: autovacuum not prioritising for-wraparound tables |
Date | |
Msg-id | 20130202132511.GA8956@awork2.anarazel.de Whole thread Raw |
In response to | Re: autovacuum not prioritising for-wraparound tables (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: autovacuum not prioritising for-wraparound tables
Re: autovacuum not prioritising for-wraparound tables |
List | pgsql-hackers |
On 2013-02-01 15:09:34 -0800, Jeff Janes wrote: > On Fri, Feb 1, 2013 at 2:34 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-02-01 14:05:46 -0800, Jeff Janes wrote: > > >> As far as I can tell this bug kicks in when your cluster gets to be > >> older than freeze_min_age, and then lasts forever after. After that > >> point pretty much every auto-vacuum inspired by update/deletion > >> activity will get promoted to a full table scan. (Which makes me > >> wonder how much field-testing the vm-only vacuum has received, if it > >> was rarely happening in practice due to this bug.) > > > > I think you're misreading the code. freezeTableLimit is calculated by > > >> > limit = ReadNewTransactionId() - freezetable; > > > which is always relative to the current xid. The bug was that > > freezetable had the wrong value in autovac due to freeze_min_age being > > used instead of freeze_table_age. > > Right. Since freeze_min_age was mistakenly being used, the limit > would be 50 million in the past (rather than 150 million) under > defaults. But since the last full-table vacuum, whenever that was, > used freeze_min_age for its intended purpose, that means the 50 > million in the past *at the time of that last vacuum* is the highest > that relfrozenxid can be. And that is going to be further back than > 50 million from right now, so the vacuum will always be promoted to a > full scan. Oh, wow. Youre right. I shouldn't answer emails after sport with cramped fingers on a friday night... And I should have thought about this scenario, because I essentially already explained it upthread, just with a different set of variables. This is rather scary. How come nobody noticed that this major performance improvement was effectively disabled for that long? I wonder if Kevin's observations about the price of autovac during OLTPish workloads isn't at least partially caused by this. It will cause lots of io prematurely because it scans far more than it should and a VACUUM FREEZE will push it off. > As an aside, it does seem like log_autovacuum_min_duration=0 should > log whether a scan_all was done, and if so what relfrozenxid got set > to. But looking at where the log message is generated, I don't know > where to retrieve that info. Yes, I agree, I already had been thinking about that because its really hard to get that information right now. It seems easy enough to include it in the ereport() at the bottom of lazy_vacuum_rel, we determine scan_all in that function, so that seems ok? For head I would actually vote for two data points, full_table_scan: yes/no, skipped_percentage..., both are already available, so it seems like it should be an easy thing to do. I'd like to do this for 9.3, agreed? I would even like to add it to the back branches, but I guess I cannot convince people of that... > [1] I don't know why it is that a scan_all vacuum with a > freeze_min_age of 50m (or a freezeLimit of 50 million ago) will not > set relfrozenxid to a higher value than that if it discovers that it > can, but it doesn't seem to. There currently is no code to track whats the oldest observed xid, so a simple implementation limitiation. Making that code better might be rather worthwile if youre loading your table in a batch and don't touch it later anymore... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: