Re: Rework the way multixact truncations work - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Rework the way multixact truncations work |
Date | |
Msg-id | 20151203093845.GA2032888@tornado.leadboat.com Whole thread Raw |
In response to | Re: Rework the way multixact truncations work (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Rework the way multixact truncations work
|
List | pgsql-hackers |
On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote: > On 2015-12-02 09:57:19 -0500, Noah Misch wrote: > > Hackers have been too reticent to revert and redo defect-laden > > commits. If doing that is weird today, let it be normal. > > Why? See my paragraph ending with the two sentences you quoted. > Especially if reverting and redoing includes conflicts that mainly > increases the chance of accidental bugs. True. (That doesn't apply to these patches.) > > git remote add community git://git.postgresql.org/git/postgresql.git > > git remote add nmisch_github https://github.com/nmisch/postgresql.git > > git fetch --multiple community nmisch_github > > git diff community/master...nmisch_github/mxt4-rm-legacy > > That's a nearly useless diff, because it includes a lot of other changes > (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since > you published the changes. Perhaps you used "git diff a..b", not "git diff a...b". If not, please send the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy" and "git --version". > > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti, > > - /* > > - * Computing the actual limits is only possible once the data directory is > > - * in a consistent state. There's no need to compute the limits while > > - * still replaying WAL - no decisions about new multis are made even > > - * though multixact creations might be replayed. So we'll only do further > > - * checks after TrimMultiXact() has been called. > > - */ > > + /* Before the TrimMultiXact() call at end of recovery, skip the rest. */ > > if (!MultiXactState->finishedStartup) > > return; > > - > > Assert(!InRecovery); > > > > - /* Set limits for offset vacuum. */ > > + /* > > + * Setting MultiXactState->oldestOffset entails a find_multixact_start() > > + * call, which is only possible once the data directory is in a consistent > > + * state. There's no need for an offset limit while still replaying WAL; > > + * no decisions about new multis are made even though multixact creations > > + * might be replayed. > > + */ > > needs_offset_vacuum = SetOffsetVacuumLimit(); > > I don't really see the benefit of this change. The previous location of > the comment is where we return early, so it seems appropriate to > document the reason there? I made that low-importance change for two reasons. First, returning at that point skips more than just the setting a limit; it also skips autovacuum signalling and wraparound warnings. Second, the function has just computed mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we defer an offset limit, not any and all limits. > > static bool > > SetOffsetVacuumLimit(void) > > { > > - MultiXactId oldestMultiXactId; > > + MultiXactId oldestMultiXactId; > > MultiXactId nextMXact; > > - MultiXactOffset oldestOffset = 0; /* placate compiler */ > > - MultiXactOffset prevOldestOffset; > > + MultiXactOffset oldestOffset = 0; /* placate compiler */ > > MultiXactOffset nextOffset; > > bool oldestOffsetKnown = false; > > + MultiXactOffset prevOldestOffset; > > bool prevOldestOffsetKnown; > > - MultiXactOffset offsetStopLimit = 0; > > I don't see the benefit of the order changes here. I reacted the same way. Commit 4f627f8 reordered some declarations, which I reverted when I refinished that commit as branch mxt3-main. > > - if (oldestOffsetKnown) > > - ereport(DEBUG1, > > - (errmsg("oldest MultiXactId member is at offset %u", > > - oldestOffset))); > > That's imo a rather useful debug message. The branches emit that message at the same times 4f627f8^ and earlier emit it. > > - else > > + if (!oldestOffsetKnown) > > + { > > + /* XXX This message is incorrect if prevOldestOffsetKnown. */ > > ereport(LOG, > > (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact%u does not exist on disk", > > oldestMultiXactId))); > > + } > > } > > Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere? I welcome a project to fix it. > > LWLockRelease(MultiXactTruncationLock); > > > > /* > > - * If we can, compute limits (and install them MultiXactState) to prevent > > - * overrun of old data in the members SLRU area. We can only do so if the > > - * oldest offset is known though. > > + * There's no need to update anything if we don't know the oldest offset > > + * or if it hasn't changed. > > */ > > Is that really a worthwhile optimization? I would neither remove that longstanding optimization nor add it from scratch today. Branch commit 06c9979 restored it as part of a larger restoration to the pre-4f627f8 structure of SetOffsetVacuumLimit(). > > -typedef struct mxtruncinfo > > -{ > > - int earliestExistingPage; > > -} mxtruncinfo; > > - > > -/* > > - * SlruScanDirectory callback > > - * This callback determines the earliest existing page number. > > - */ > > -static bool > > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data) > > -{ > > - mxtruncinfo *trunc = (mxtruncinfo *) data; > > - > > - if (trunc->earliestExistingPage == -1 || > > - ctl->PagePrecedes(segpage, trunc->earliestExistingPage)) > > - { > > - trunc->earliestExistingPage = segpage; > > - } > > - > > - return false; /* keep going */ > > -} > > - > > That really seems like an independent change, deserving its own commit + > explanation. Indeed. I explained that change at length in http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com, including that it's alone on a branch (mxt1-disk-independent), to become its own PostgreSQL commit. > > [branch commit 89a7232] > > I don't think that's really a good idea - this way we restart after > every single page write, whereas currently we only restart after passing > through all pages once. In nearly all cases we'll only ever have to > retry once in total, be because such old pages aren't usually going to > be reread/redirtied. Your improvement sounds fine, then. Would both SimpleLruTruncate() and SlruDeleteSegment() benefit from it? > > @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record) > > LWLockRelease(OidGenLock); > > MultiXactSetNextMXact(checkPoint.nextMulti, > > checkPoint.nextMultiOffset); > > - > > - MultiXactAdvanceOldest(checkPoint.oldestMulti, > > - checkPoint.oldestMultiDB); > > SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); > > + SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB); > > > > /* > > * If we see a shutdown checkpoint while waiting for an end-of-backup > > @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record) > > LWLockRelease(OidGenLock); > > MultiXactAdvanceNextMXact(checkPoint.nextMulti, > > checkPoint.nextMultiOffset); > > - > > - /* > > - * NB: This may perform multixact truncation when replaying WAL > > - * generated by an older primary. > > - */ > > - MultiXactAdvanceOldest(checkPoint.oldestMulti, > > - checkPoint.oldestMultiDB); > > if (TransactionIdPrecedes(ShmemVariableCache->oldestXid, > > checkPoint.oldestXid)) > > SetTransactionIdLimit(checkPoint.oldestXid, > > checkPoint.oldestXidDB); > > + MultiXactAdvanceOldest(checkPoint.oldestMulti, > > + checkPoint.oldestMultiDB); > > + > > /* ControlFile->checkPointCopy always tracks the latest ckpt XID */ > > ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch; > > ControlFile->checkPointCopy.nextXid = > > checkPoint.nextXid; > > Why? master does not and will not have legacy truncation, so the deleted comment does not belong in master. Regarding the SetMultiXactIdLimit() call: commit 611a2ec Author: Noah Misch <noah@leadboat.com> AuthorDate: Sat Nov 7 15:06:28 2015 -0500 Commit: Noah Misch <noah@leadboat.com> CommitDate: Sat Nov 7 15:06:28 2015 -0500 In xlog_redo(), believe a SHUTDOWN checkPoint.oldestMulti exactly. It was so before this branch. This restores consistencywith the handling of nextXid, nextMulti and oldestMulti: we treat them as exact for XLOG_CHECKPOINT_SHUTDOWNand as minima for XLOG_CHECKPOINT_ONLINE. I do not know of a case where this definitely mattersfor any of these counters. It might matter if a bug causes oldestXid to move forward wrongly, causing it to thenmove backward later. (I don't know if VACUUM does ever move oldestXid backward, but it's a plausible thing to doif on-disk state fully agrees with an older value.) That example has no counterpart for oldestMultiXactId, because anyupdate first arrives in an XLOG_MULTIXACT_TRUNCATE_ID record. Therefore, this commit is probably cosmetic. > > - /* > > - * Truncate CLOG, multixact and CommitTs to the oldest computed value. > > - */ > > TruncateCLOG(frozenXID); > > TruncateCommitTs(frozenXID); > > TruncateMultiXact(minMulti, minmulti_datoid); > > Why? Sure, it's not a super important comment, but ...? Yeah, it scarcely matters either way. Commit 4f627f8 reduced this comment to merely restating the code, so I removed it instead. > > @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void) > > * the result is somewhat indeterminate, but we don't really care. Even in > > * a multiprocessor with delayed writes to shared memory, it should be certain > > * that setting of delayChkpt will propagate to shared memory when the backend > > - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if > > - * it's already inserted its commit record. Whether it takes a little while > > + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if it's > > + * already inserted its critical xlog record. Whether it takes a little while > > * for clearing of delayChkpt to propagate is unimportant for correctness. > > */ > > Seems unrelated, given that this is already used in > MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a > good idea, but it's not really tied to the truncation changes. Quite so; its branch (one branch = one proposed PostgreSQL commit), mxt2-cosmetic, contains no truncation changes. Likewise for the other independent comment improvements you noted. nm
pgsql-hackers by date: