What is "wraparound failure", really? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | What is "wraparound failure", really? |
Date | |
Msg-id | CAH2-Wzk_FxfJvs4TnUtj=DCsokbiK0CxfjZ9jjrfSx8sTWkeUg@mail.gmail.com Whole thread Raw |
Responses |
Re: What is "wraparound failure", really?
Re: What is "wraparound failure", really? Re: What is "wraparound failure", really? |
List | pgsql-hackers |
The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal documentation -- just a basic description of how the GUCs work. I think that it certainly merits some discussion under "25.1. Routine Vacuuming" -- more specifically under "25.1.5. Preventing Transaction ID Wraparound Failures". One reason why this didn't happen in the original commit was that I just didn't know where to start with it. The docs in question have said this since 2006's commit 48188e16 first added autovacuum_freeze_max_age: "The sole disadvantage of increasing autovacuum_freeze_max_age (and vacuum_freeze_table_age along with it) is that the pg_xact and pg_commit_ts subdirectories of the database cluster will take more space..." This sentence seems completely unreasonable to me. It seems to just ignore the huge disadvantage of increasing autovacuum_freeze_max_age: the *risk* that the system will stop being able to allocate new XIDs because GetNewTransactionId() errors out with "database is not accepting commands to avoid wraparound data loss...". Sure, it's possible to take a lot of risk here without it ever blowing up in your face. And if it doesn't blow up then the downside really is zero. This is hardly a sensible way to talk about this important risk. Or any risk at all. At first I thought that the sentence was not just misguided -- it seemed downright bizarre. I thought that it was directly at odds with the title "Preventing Transaction ID Wraparound Failures". I thought that the whole point of this section was how not to have a wraparound failure (as I understand the term), and yet we seem to deliberately ignore the single most important practical aspect of making sure that that doesn't happen. But I now suspect that the basic definitions have been mixed up in a subtle but important way. What the documentation calls a "wraparound failure" seems to be rather different to what I thought that that meant. As I said, I thought that that meant the condition of being unable to get new transaction IDs (at least until the DBA runs VACUUM in single user mode). But the documentation in question seems to actually define it as "the condition of an old MVCC snapshot failing to see a version from the distant past, because somehow an XID wraparound suddenly makes it look as if it's in the distant future rather than in the past". It's actually talking about a subtly different thing, so the "sole disadvantage" sentence is not actually bizarre. It does still seem impractical and confusing, though. I strongly suspect that my interpretation of what "wraparound failure" means is actually the common one. Of course the system is never under any circumstances allowed to give totally wrong answers to queries, no matter what -- users should be able to take that much for granted. What users care about here is sensibly managing XIDs as a resource -- preventing "XID exhaustion" while being conservative, but not ridiculously conservative. Could the documentation be completely misleading users here? I have two questions: 1. Do I have this right? Is there really confusion about what a "wraparound failure" means, or is the confusion mine alone? 2. How do I go about integrating discussion of the failsafe here? Anybody have thoughts on that? -- Peter Geoghegan
pgsql-hackers by date: