Re: Transaction ID wraparound: problem and proposed solution - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Transaction ID wraparound: problem and proposed solution |
Date | |
Msg-id | 200101200500.AAA05265@candle.pha.pa.us Whole thread Raw |
In response to | Transaction ID wraparound: problem and proposed solution (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Transaction ID wraparound: problem and proposed solution
status of 64bit ints? was: Re: Transaction ID wraparound: problem and proposed solution |
List | pgsql-hackers |
I have added this email thread to TODO.detail. > We've expended a lot of worry and discussion in the past about what > happens if the OID generator wraps around. However, there is another > 4-byte counter in the system: the transaction ID (XID) generator. > While OID wraparound is survivable, if XIDs wrap around then we really > do have a Ragnarok scenario. The tuple validity checks do ordered > comparisons on XIDs, and will consider tuples with xmin > current xact > to be invalid. Result: after wraparound, your whole database would > instantly vanish from view. > > The first thought that comes to mind is that XIDs should be promoted to > eight bytes. However there are several practical problems with this: > * portability --- I don't believe long long int exists on all the > platforms we support. > * performance --- except on true 64-bit platforms, widening Datum to > eight bytes would be a system-wide performance hit, which is a tad > unpleasant to fix a scenario that's not yet been reported from the > field. > * disk space --- letting pg_log grow without bound isn't a pleasant > prospect either. > > I believe it is possible to fix these problems without widening XID, > by redefining XIDs in a way that allows for wraparound. Here's my > plan: > > 1. Allow XIDs to range from 0 to WRAPLIMIT-1 (WRAPLIMIT is not > necessarily 4G, see discussion below). Ordered comparisons on XIDs > are no longer simply "x < y", but need to be expressed as a macro. > We consider x < y if (y - x) % WRAPLIMIT < WRAPLIMIT/2. > This comparison will work as long as the range of interesting XIDs > never exceeds WRAPLIMIT/2. Essentially, we envision the actual value > of XID as being the low-order bits of a logical XID that always > increases, and we assume that no extant XID is more than WRAPLIMIT/2 > transactions old, so we needn't keep track of the high-order bits. > > 2. To keep the system from having to deal with XIDs that are more than > WRAPLIMIT/2 transactions old, VACUUM should "freeze" known-good old > tuples. To do this, we'll reserve a special XID, say 1, that is always > considered committed and is always less than any ordinary XID. (So the > ordered-comparison macro is really a little more complicated than I said > above. Note that there is already a reserved XID just like this in the > system, the "bootstrap" XID. We could simply use the bootstrap XID, but > it seems better to make another one.) When VACUUM finds a tuple that > is committed good and has xmin < XmaxRecent (the oldest XID that might > be considered uncommitted by any open transaction), it will replace that > tuple's xmin by the special always-good XID. Therefore, as long as > VACUUM is run on all tables in the installation more often than once per > WRAPLIMIT/2 transactions, there will be no tuples with ordinary XIDs > older than WRAPLIMIT/2. > > 3. At wraparound, the XID counter has to be advanced to skip over the > InvalidXID value (zero) and the reserved XIDs, so that no real transaction > is generated with those XIDs. No biggie here. > > 4. With the wraparound behavior, pg_log will have a bounded size: it > will never exceed WRAPLIMIT*2 bits = WRAPLIMIT/4 bytes. Since we will > recycle pg_log entries every WRAPLIMIT xacts, during transaction start > the xact manager will have to take care to actively clear its pg_log > entry to zeroes (I'm not sure if it does that already, or just assumes > that new pg_log entries will start out zero). As long as that happens > before the xact makes any data changes, it's OK to recycle the entry. > Note we are assuming that no tuples will remain in the database with > xmin or xmax equal to that XID from a prior cycle of the universe. > > This scheme allows us to survive XID wraparound at the cost of slight > additional complexity in ordered comparisons of XIDs (which is not a > really performance-critical task AFAIK), and at the cost that the > original insertion XIDs of all but recent tuples will be lost by > VACUUM. The system doesn't particularly care about that, but old XIDs > do sometimes come in handy for debugging purposes. A possible > compromise is to overwrite only XIDs that are older than, say, > WRAPLIMIT/4 instead of doing so as soon as possible. This would mean > the required VACUUM frequency is every WRAPLIMIT/4 xacts instead of > every WRAPLIMIT/2 xacts. > > We have a straightforward tradeoff between the maximum size of pg_log > (WRAPLIMIT/4 bytes) and the required frequency of VACUUM (at least > every WRAPLIMIT/2 or WRAPLIMIT/4 transactions). This could be made > configurable in config.h for those who're intent on customization, > but I'd be inclined to set the default value at WRAPLIMIT = 1G. > > Comments? Vadim, is any of this about to be superseded by WAL? > If not, I'd like to fix it for 7.1. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: