multixacts woes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | multixacts woes |
Date | |
Msg-id | CA+Tgmob=5wTscr0_1+MTdy4kRB6YSo6jxwTwnP9hM3f4m_xyzA@mail.gmail.com Whole thread Raw |
Responses |
Re: multixacts woes
Re: multixacts woes Re: multixacts woes Re: multixacts woes |
List | pgsql-hackers |
My colleague Thomas Munro and I have been working with Alvaro, and also with Kevin and Amit, to fix bug #12990, a multixact-related data corruption bug. I somehow did not realize until very recently that we actually use two SLRUs to keep track of multixacts: one for the multixacts themselves (pg_multixacts/offsets) and one for the members (pg_multixacts/members). Confusingly, members are sometimes called offsets, and offsets are sometimes called IDs, or multixacts. If either of these SLRUs wraps around, we get data loss. This comment in multixact.c explains it well: /* * Since multixacts wrap differently from transaction IDs, this logic is * not entirely correct: insome scenarios we could go for longer than 2 * billion multixacts without seeing any data loss, and in some others we * could get in trouble before that if the new pg_multixact/members data * stomps on the previouscycle's data. For lack of a better mechanism we * use the same logic as for transaction IDs, that is, start taking action * halfway around the oldest potentially-existing multixact. */ multiWrapLimit = oldest_datminmxid+ (MaxMultiXactId >> 1); if (multiWrapLimit < FirstMultiXactId) multiWrapLimit += FirstMultiXactId; Apparently, we have been hanging our hat since the release of 9.3.0 on the theory that the average multixact won't ever have more than two members, and therefore the members SLRU won't overwrite itself and corrupt data. This is not good enough: we need to prevent multixact IDs from wrapping around, and we separately need to prevent multixact members from wrapping around, and the current code was conflating those things in a way that simply didn't work. Recent commits by Alvaro and by me have mostly fixed this, but there are a few loose ends: 1. I believe that there is still a narrow race condition that cause the multixact code to go crazy and delete all of its data when operating very near the threshold for member space exhaustion. See http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com for the scenario and proposed fix. 2. We have some logic that causes autovacuum to run in spite of autovacuum=off when wraparound threatens. My commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the anti-wraparound protections for multixact members that exist for multixact IDs and for regular XIDs, but this remains an outstanding issue. I believe I know how to fix this, and will work up an appropriate patch based on some of Thomas's earlier work. 3. It seems to me that there is a danger that some users could see extremely frequent anti-mxid-member-wraparound vacuums as a result of this work. Granted, that beats data corruption or errors, but it could still be pretty bad. The default value of autovacuum_multixact_freeze_max_age is 400000000. Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the addressable space, or 1073741824 total members. So, if your typical multixact has more than 1073741824/400000000 = ~2.68 members, you're going to see more autovacuum activity as a result of this change. We're effectively capping autovacuum_multixact_freeze_max_age at 1073741824/(average size of your multixacts). If your multixacts are just a couple of members (like 3 or 4) this is probably not such a big deal. If your multixacts typically run to 50 or so members, your effective freeze age is going to drop from 400m to ~21.4m. At that point, I think it's possible that relminmxid advancement might start to force full-table scans more often than would be required for relfrozenxid advancement. If so, that may be a problem for some users. What can we do about this? Alvaro proposed back-porting his fix for bug #8470, which avoids locking a row if a parent subtransaction already has the same lock. Alvaro tells me (via chat) that on some workloads this can dramatically reduce multixact size, which is certainly appealing. But the fix looks fairly invasive - it changes the return value of HeapTupleSatisfiesUpdate in certain cases, for example - and I'm not sure it's been thoroughly code-reviewed by anyone, so I'm a little nervous about the idea of back-porting it at this point. I am inclined to think it would be better to release the fixes we have - after handling items 1 and 2 - and then come back to this issue. Another thing to consider here is that if the high rate of multixact consumption is organic rather than induced by lots of subtransactions of the same parent locking the same tuple, this fix won't help. Another thought that occurs to me is that if we had a freeze map, it would radically decrease the severity of this problem, because freezing would become vastly cheaper. I wonder if we ought to try to get that into 9.5, even if it means holding up 9.5. Quite aside from multixacts, repeated wraparound autovacuuming of static data is a progressively more serious problem as data set sizes and transaction volumes increase. The possibility that multixact freezing may in some scenarios exacerbate that problem is just icing on the cake. The fundamental problem is that a 32-bit address space just isn't that big on modern hardware, and the problem is worse for multixact members than it is for multixact IDs, because a given multixact only uses consumes one multixact ID, but as many slots in the multixact member space as it has members. Thoughts, advice, etc. are most welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: