Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [BUG] Archive recovery failure on 9.3+. |
Date | |
Msg-id | 20140214.173857.65272356.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [BUG] Archive recovery failure on 9.3+. (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: [BUG] Archive recovery failure on 9.3+.
|
List | pgsql-hackers |
Hello, Before taking up the topic.. At Thu, 13 Feb 2014 19:45:38 +0200, Heikki Linnakangas wrote > On 02/13/2014 06:47 PM, Heikki Linnakangas wrote: > > On 02/13/2014 02:42 PM, Heikki Linnakangas wrote: > >> The behavior where we prefer a segment from archive with lower TLI > >> over > >> a file with higher TLI in pg_xlog actually changed in commit > >> a068c391ab0. Arguably changing it wasn't a good idea, but the problem > >> your test script demonstrates can be fixed by not archiving the > >> partial > >> segment, with no change to the preference of archive/pg_xlog. As > >> discussed, archiving a partial segment seems like a bad idea anyway, > >> so > >> let's just stop doing that. It surely makes things simple and I rather like the idea but as long as the final and possiblly partial segment of the lower TLI is actually created and the recovery mechanism allows users to command recovery operation requires such segments (recovery_target_timeline does this), a "perfect archive" - which means an archive which can cover all sorts of restore operatoins - necessarily may have such duplicate segments, I believe. Besides, I suppose that policy makes operations around archive/restore way difficult. DBAs should get stuck with tensive work of picking only actually needed segments for the recovery undertaken out of the haystack. It sounds somewhat gloomy.. # However I also doubt the appropriateness of stockpiling archive # segments spanning over so many timelines, two generations are # enough to cause this issue. Anyway, returning to the topic, > > After some further thought, while not archiving the partial segment > > fixes your test script, it's not enough to fix all variants of the > > problem. Even if archive recovery doesn't archive the last, partial, > > segment, if the original master server is still running, it's entirely > > possible that it fills the segment and archives it. In that case, > > archive recovery will again prefer the archived segment with lower TLI > > over the segment with newer TLI in pg_xlog. Yes, it is the generalized description of the case I've mentioned. (Though I've not reached that thought :) > > So I agree we should commit the patch you posted (or something to that > > effect). The change to not archive the last segment still seems like a > > good idea, but perhaps we should only do that in master. My opinion on duplicate segments on older timelines is as decribed above. > To draw this to conclusion, barring any further insights to this, I'm > going to commit the attached patch to master and REL9_3_STABLE. Please > have a look at the patch, to see if I'm missing something. I modified > the state machine to skip over XLOG_FROM_XLOG state, if reading in > XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and > pg_xlog together, and then pg_xlog alone, which is pointless. > > In master, I'm also going to remove the "archive last segment on old > timeline" code. Thank you for finishing the patch. I didn't think of the behavior after XLOG_FROM_ARCHIVE failure. It seems that the state machine will go round getting rid of extra round with it. Recovery process becomes able to grab the segment on highest (expected) TLI among those with the same segment id regardless of their locations. I think the recovery process will cope with "perfect" archives described above for all types of recovery operation. The state machine loop considering fallback from archive to pg_xlog now seems somewhat too complicated than needed but it's also no harm. Though, here which was in my original patch, > readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, > currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource); is sticking far out the line wrapping boundary and seems somewhat dirty:( And what the conditional operator seems to make the meaning of the XLOG_FROM_ARCHIVE and _ANY a bit confused. But I failed to unify them to any side so it is left as is.. Finally, the patch you will find attached is fixed only in styling mentioned above from your last patch. This patch applies current HEAD and I confirmed that it fixes this issue but I have not checked the lastSourceFailed section. Simple file removal could not lead to there. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 508970a..85a0ce9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11006,17 +11006,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, /*------- * Standby modeis implemented by a state machine: * - * 1. Read from archive (XLOG_FROM_ARCHIVE) - * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG) - * 3. Check trigger file - * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM) - * 5. Rescan timelines - * 6. Sleep 5 seconds, and loop back to 1. + * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just + * pg_xlog (XLOG_FROM_XLOG) + * 2. Check trigger file + * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM) + * 4. Rescan timelines + * 5. Sleep 5 seconds, and loop back to 1. * * Failure to read from the current source advances the state machineto - * the next state. In addition, successfully reading a file from pg_xlog - * moves the state machine from state 2 back to state 1 (we always prefer - * files in the archive over files in pg_xlog). + * the next state. * * 'currentSource' indicates the current state. There are no currentSource * valuesfor "check trigger", "rescan timelines", and "sleep" states, @@ -11044,9 +11042,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, switch (currentSource) { case XLOG_FROM_ARCHIVE: - currentSource = XLOG_FROM_PG_XLOG; - break; - case XLOG_FROM_PG_XLOG: /* @@ -11212,7 +11207,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * Try to restorethe file from archive, or read an existing * file from pg_xlog. */ - readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource); + readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, + currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : + currentSource); if (readFile >= 0) return true; /* success! */
pgsql-hackers by date: