Home > mailing lists
Re: Rework the way multixact truncations work - Mailing list pgsql-hackers

From	Noah Misch
Subject	Re: Rework the way multixact truncations work
Date	December 3, 2015 09:38:56
Msg-id	20151203093845.GA2032888@tornado.leadboat.com Whole thread Raw
In response to	Re: Rework the way multixact truncations work (Andres Freund <andres@anarazel.de>)
Responses	Re: Rework the way multixact truncations work
List	pgsql-hackers
Tree view
On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> > Hackers have been too reticent to revert and redo defect-laden
> > commits.  If doing that is weird today, let it be normal.
> 
> Why?

See my paragraph ending with the two sentences you quoted.

> Especially if reverting and redoing includes conflicts that mainly
> increases the chance of accidental bugs.

True.  (That doesn't apply to these patches.)

> > git remote add community git://git.postgresql.org/git/postgresql.git
> > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > git fetch --multiple community nmisch_github
> > git diff community/master...nmisch_github/mxt4-rm-legacy
> 
> That's a nearly useless diff, because it includes a lot of other changes
> (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> you published the changes.

Perhaps you used "git diff a..b", not "git diff a...b".  If not, please send
the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy"
and "git --version".

> > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > -    /*
> > -     * Computing the actual limits is only possible once the data directory is
> > -     * in a consistent state. There's no need to compute the limits while
> > -     * still replaying WAL - no decisions about new multis are made even
> > -     * though multixact creations might be replayed. So we'll only do further
> > -     * checks after TrimMultiXact() has been called.
> > -     */
> > +    /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> >      if (!MultiXactState->finishedStartup)
> >          return;
> > -
> >      Assert(!InRecovery);
> > 
> > -    /* Set limits for offset vacuum. */
> > +    /*
> > +     * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > +     * call, which is only possible once the data directory is in a consistent
> > +     * state. There's no need for an offset limit while still replaying WAL;
> > +     * no decisions about new multis are made even though multixact creations
> > +     * might be replayed.
> > +     */
> >      needs_offset_vacuum = SetOffsetVacuumLimit();
> 
> I don't really see the benefit of this change. The previous location of
> the comment is where we return early, so it seems appropriate to
> document the reason there?

I made that low-importance change for two reasons.  First, returning at that
point skips more than just the setting a limit; it also skips autovacuum
signalling and wraparound warnings.  Second, the function has just computed
mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
defer an offset limit, not any and all limits.

> >  static bool
> >  SetOffsetVacuumLimit(void)
> >  {
> > -    MultiXactId oldestMultiXactId;
> > +    MultiXactId    oldestMultiXactId;
> >      MultiXactId nextMXact;
> > -    MultiXactOffset oldestOffset = 0;    /* placate compiler */
> > -    MultiXactOffset prevOldestOffset;
> > +    MultiXactOffset oldestOffset = 0;        /* placate compiler */
> >      MultiXactOffset nextOffset;
> >      bool        oldestOffsetKnown = false;
> > +    MultiXactOffset prevOldestOffset;
> >      bool        prevOldestOffsetKnown;
> > -    MultiXactOffset offsetStopLimit = 0;
> 
> I don't see the benefit of the order changes here.

I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
reverted when I refinished that commit as branch mxt3-main.

> > -        if (oldestOffsetKnown)
> > -            ereport(DEBUG1,
> > -                    (errmsg("oldest MultiXactId member is at offset %u",
> > -                            oldestOffset)));
> 
> That's imo a rather useful debug message.

The branches emit that message at the same times 4f627f8^ and earlier emit it.

> > -        else
> > +        if (!oldestOffsetKnown)
> > +        {
> > +            /* XXX This message is incorrect if prevOldestOffsetKnown. */
> >              ereport(LOG,
> >                      (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed
MultiXact%u does not exist on disk",
 
> >                              oldestMultiXactId)));
> > +        }
> >      }
> 
> Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

I welcome a project to fix it.

> >      LWLockRelease(MultiXactTruncationLock);
> > 
> >      /*
> > -     * If we can, compute limits (and install them MultiXactState) to prevent
> > -     * overrun of old data in the members SLRU area. We can only do so if the
> > -     * oldest offset is known though.
> > +     * There's no need to update anything if we don't know the oldest offset
> > +     * or if it hasn't changed.
> >       */
> 
> Is that really a worthwhile optimization?

I would neither remove that longstanding optimization nor add it from scratch
today.  Branch commit 06c9979 restored it as part of a larger restoration to
the pre-4f627f8 structure of SetOffsetVacuumLimit().

> > -typedef struct mxtruncinfo
> > -{
> > -    int            earliestExistingPage;
> > -} mxtruncinfo;
> > -
> > -/*
> > - * SlruScanDirectory callback
> > - *        This callback determines the earliest existing page number.
> > - */
> > -static bool
> > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> > -{
> > -    mxtruncinfo *trunc = (mxtruncinfo *) data;
> > -
> > -    if (trunc->earliestExistingPage == -1 ||
> > -        ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> > -    {
> > -        trunc->earliestExistingPage = segpage;
> > -    }
> > -
> > -    return false;                /* keep going */
> > -}
> > -
> 
> That really seems like an independent change, deserving its own commit +
> explanation.

Indeed.  I explained that change at length in
http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com,
including that it's alone on a branch (mxt1-disk-independent), to become its
own PostgreSQL commit.

> > [branch commit 89a7232]
> 
> I don't think that's really a good idea - this way we restart after
> every single page write, whereas currently we only restart after passing
> through all pages once. In nearly all cases we'll only ever have to
> retry once in total, be because such old pages aren't usually going to
> be reread/redirtied.

Your improvement sounds fine, then.  Would both SimpleLruTruncate() and
SlruDeleteSegment() benefit from it?

> > @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record)
> >          LWLockRelease(OidGenLock);
> >          MultiXactSetNextMXact(checkPoint.nextMulti,
> >                                checkPoint.nextMultiOffset);
> > -
> > -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                               checkPoint.oldestMultiDB);
> >          SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
> > +        SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
> > 
> >          /*
> >           * If we see a shutdown checkpoint while waiting for an end-of-backup
> > @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record)
> >          LWLockRelease(OidGenLock);
> >          MultiXactAdvanceNextMXact(checkPoint.nextMulti,
> >                                    checkPoint.nextMultiOffset);
> > -
> > -        /*
> > -         * NB: This may perform multixact truncation when replaying WAL
> > -         * generated by an older primary.
> > -         */
> > -        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > -                               checkPoint.oldestMultiDB);
> >          if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
> >                                    checkPoint.oldestXid))
> >              SetTransactionIdLimit(checkPoint.oldestXid,
> >                                    checkPoint.oldestXidDB);
> > +        MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > +                               checkPoint.oldestMultiDB);
> > +
> >          /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
> >          ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
> >          ControlFile->checkPointCopy.nextXid =
> >          checkPoint.nextXid;
> 
> Why?

master does not and will not have legacy truncation, so the deleted comment
does not belong in master.  Regarding the SetMultiXactIdLimit() call:

commit 611a2ec
Author:     Noah Misch <noah@leadboat.com>
AuthorDate: Sat Nov 7 15:06:28 2015 -0500
Commit:     Noah Misch <noah@leadboat.com>
CommitDate: Sat Nov 7 15:06:28 2015 -0500
   In xlog_redo(), believe a SHUTDOWN checkPoint.oldestMulti exactly.      It was so before this branch.  This restores
consistencywith the   handling of nextXid, nextMulti and oldestMulti: we treat them as exact   for
XLOG_CHECKPOINT_SHUTDOWNand as minima for XLOG_CHECKPOINT_ONLINE.   I do not know of a case where this definitely
mattersfor any of these   counters.  It might matter if a bug causes oldestXid to move forward   wrongly, causing it to
thenmove backward later.  (I don't know if   VACUUM does ever move oldestXid backward, but it's a plausible thing to
doif on-disk state fully agrees with an older value.)  That example has   no counterpart for oldestMultiXactId, because
anyupdate first arrives   in an XLOG_MULTIXACT_TRUNCATE_ID record.  Therefore, this commit is   probably cosmetic.
 

> > -    /*
> > -     * Truncate CLOG, multixact and CommitTs to the oldest computed value.
> > -     */
> >      TruncateCLOG(frozenXID);
> >      TruncateCommitTs(frozenXID);
> >      TruncateMultiXact(minMulti, minmulti_datoid);
> 
> Why? Sure, it's not a super important comment, but ...?

Yeah, it scarcely matters either way.  Commit 4f627f8 reduced this comment to
merely restating the code, so I removed it instead.

> > @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void)
> >   * the result is somewhat indeterminate, but we don't really care.  Even in
> >   * a multiprocessor with delayed writes to shared memory, it should be certain
> >   * that setting of delayChkpt will propagate to shared memory when the backend
> > - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if
> > - * it's already inserted its commit record.  Whether it takes a little while
> > + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if it's
> > + * already inserted its critical xlog record.  Whether it takes a little while
> >   * for clearing of delayChkpt to propagate is unimportant for correctness.
> >   */
> 
> Seems unrelated, given that this is already used in
> MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a
> good idea, but it's not really tied to the truncation changes.

Quite so; its branch (one branch = one proposed PostgreSQL commit),
mxt2-cosmetic, contains no truncation changes.  Likewise for the other
independent comment improvements you noted.

nm
pgsql-hackers by date:
From: amul sul
Date: 03 December 2015, 08:53:21
Subject: Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
From: Kyotaro HORIGUCHI
Date: 03 December 2015, 10:06:05
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Re: Rework the way multixact truncations work - Mailing list pgsql-hackers

Previous

Next