Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
Date | |
Msg-id | CA+Tgmoa0MgHn0=nJFPTtd7M9eJ6Q5eb5hFSx1OJXYTR9b-3f9w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access
status of transaction 1
Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
List | pgsql-general |
On Fri, May 29, 2015 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Working on that now. OK, here's a patch. Actually two patches, differing only in whitespace, for 9.3 and for master (ha!). I now think that the root of the problem here is that DetermineSafeOldestOffset() and SetMultiXactIdLimit() were largely ignorant of the possibility that they might be called at points in time when the cluster was inconsistent. SetMultiXactIdLimit() bracketed certain parts of its logic with if (!InRecovery), but those guards were ineffective because it gets called before InRecovery is set in the first place. It seems pretty clear that we can't effectively determine anything about member wraparound until the cluster is consistent. Before then, there might be files missing from the offsets or members SLRUs which get put back during replay. There could even be gaps in the sequence of files, with some things having made it to disk before the crash (or having made it into the backup) and others not. So all the work of determining what the safe stop points and vacuum thresholds for members are needs to be postponed until TrimMultiXact() time. And that's fine, because we don't need this information in recovery anyway - it only affects behavior in normal running. So this patch does the following: 1. Moves the call to DetermineSafeOldestOffset() that appears in StartupMultiXact() to TrimMultiXact(), so that we don't try to do this until we're consistent. Also, instead of passing MultiXactState->oldestMultiXactId, pass the newer of that value and the earliest offset that exists on disk. That way, it won't try to read data that's not there. Note that the second call to DetermineSafeOldestOffset() in TruncateMultiXact() doesn't need a similar guard, because we already bail out of that function early if the multixacts we're going to truncate away don't exist. 2. Adds a new flag MultiXactState->didTrimMultiXact indicate whether we've finished TrimMultiXact(), and arranges for SetMultiXactIdLimit() to use that rather than InRecovery to test whether it's safe to do complicated things that might require that the cluster is consistent. This is a slight behavior change, since formerly we would have tried to do that stuff very early in the startup process, and now it won't happen until somebody completes a vacuum operation. If that's a problem, we could consider doing it in TrimMultiXact(), but I don't think it's safe the way it was. The new flag also prevents oldestOffset from being set while in recovery; I think it would be safe to do that in recovery once we've reached consistency, but I don't believe it's necessary. 3. Arranges for TrimMultiXact() to set oldestOffset. This is necessary because, previously, we relied on SetMultiXactIdLimit doing that during early startup or during recovery, and that's no longer true. Here too we set oldestOffset keeping in mind that our notion of the oldest multixact may point to something that doesn't exist; if so, we use the oldest MXID that does. 4. Modifies TruncateMultiXact() so that it doesn't re-scan the SLRU directory on every call to find the oldest file that exists. Instead, it arranges to remember the value from the first scan and then updates it thereafter to reflect its own truncation activity. This isn't absolutely necessary, but because this oldest-file logic is used in multiple places (TrimMultiXact, SetMultiXactIdLimit, and TruncateMultiXact all need it directly or indirectly) caching the value seems like a better idea than recomputing it frequently. I have tested that this patch fixes Steve Kehlet's problem, or at least what I believe to be Steve Kehlet's problem based on the reproduction scenario I described upthread. I believe it will also fix the problems with starting up from a base backup with Alvaro mentioned upthread. It won't fix the fact that pg_upgrade is putting a wrong value into everybody's datminmxid field, which should really be addressed too, but I've been working on this for about three days virtually non-stop and I don't have the energy to tackle it right now. If anyone feels the urge to step into that breech, I think what it needs to do is: when upgrading from a 9.3-or-later instance, copy over each database's datminmxid into the corresponding database in the new cluster. Aside from that, it's very possible that despite my best efforts this has serious bugs. Review and testing would be very much appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-general by date: