Re: Heads-Up: multixact freezing bug - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Heads-Up: multixact freezing bug |
Date | |
Msg-id | 20131128171043.GG5513@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Heads-Up: multixact freezing bug (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Heads-Up: multixact freezing bug
|
List | pgsql-hackers |
Andres Freund wrote: > Instead of calculating the multixact cutoff xid by using the global > minimum of OldestMemberMXactId[] and OldestVisibleMXactId[] and then > subtracting vacuum_freeze_min_age compute it solely as the minimum of > OldestMemberMXactId[]. If we do that computation *after* doing the > GetOldestXmin() in vacuum_set_xid_limits() we can be sure no mxact above > the new mxact cutoff will contain a xid below the xid cutoff. This is so > since it would otherwise have been reported as running by > GetOldestXmin(). > With that change we can leave heap_tuple_needs_freeze() and > heap_freeze_tuple() unchanged since using the mxact cutoff is > sufficient. Some thoughts here: 1. Using vacuum_freeze_min_age was clearly a poor choice. Normally (XIDs are normally consumed much faster than multis), it's far too large. In your reported case (per IM discussion), the customer is approaching 4 billion Xids but is still at 15 million multixids; so the relminmxid is still 1, because the default freeze_min_age is 50 million ... so at their current rate, they will wrap around the Xid counter 3-4 times before seeing this minmxid value advance at all. 2. Freezing too much has the disadvantage that you lose info possibly useful for forensics. And I believe that freezing just after a multi has gone below the immediate visibility horizon will make them live far too little. Now the performance guys are always saying how they would like tuples to even start life frozen, let alone delay any number of transactions before them being frozen; but to help the case for those who investigate and fix corrupted databases, we need a higher freeze horizon. Heck, maybe even 100k multis would be enough to keep enough evidence to track bugs down. I propose we keep at least a million. This is an even more important argument currently, given how buggy the current multixact code has proven to be. 2a. Freezing less also means less thrashing ... 3. I'm not sure I understand how the proposal above fixes things during recovery. If we keep the multi values above the freeze horizon you propose above, are we certain no old Xid values will remain? 4. Maybe it would be useful to emit a more verbose freezing record in HEAD, even if we introduce some dirty ugly hack in 9.3 to avoid having to change WAL format. 4a. Maybe we can introduce a new WAL record in 9.3 anyway and tell people to always upgrade the replicas before the masters. (I think we did this in the past once.) 3 and 4 in combination: maybe we can change 9.3 to not have any breathing room for freezing, to fix the current swarm of bugs without having to change WAL format, and do something more invasive in HEAD to keep more multis around for forensics. 5. the new multixact stuff seems way too buggy. Should we rip it all out and return to the old tuple locking scheme? We spent a huge amount of time writing it and reviewing it and now maintaining, but I haven't seen a *single* performance report saying how awesome 9.3 is compared to older releases due to this change; the 9.3 request for testing, at the start of the beta period, didn't even mention to try it out *at all*. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: