Re: time-delayed standbys - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: time-delayed standbys |
Date | |
Msg-id | BANLkTinfrgsVK_8o9+mw6kWRcC4BxiR4jw@mail.gmail.com Whole thread Raw |
In response to | Re: time-delayed standbys (Jaime Casanova <jaime@2ndquadrant.com>) |
Responses |
Re: time-delayed standbys
Re: time-delayed standbys |
List | pgsql-hackers |
On Sat, Apr 23, 2011 at 9:46 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> That is, a standby configured such that replay lags a prescribed >> amount of time behind the master. >> >> This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached. >> > > This crashes when stoping recovery to a target (i tried with a named > restore point and with a poin in time) after executing > pg_xlog_replay_resume(). here is the backtrace. I will try to check > later but i wanted to report it before... > > #0 0xb7777537 in raise () from /lib/libc.so.6 > #1 0xb777a922 in abort () from /lib/libc.so.6 > #2 0x08393a19 in errfinish (dummy=0) at elog.c:513 > #3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 "wal receiver > still active") at elog.c:1156 > #4 0x080f04cb in StartupXLOG () at xlog.c:6691 > #5 0x080f2825 in StartupProcessMain () at xlog.c:10050 > #6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at > bootstrap.c:417 > #7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488 > #8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106 > #9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199 Sorry for the slow response on this - I was on vacation for a week and my schedule got a big hole in it. I was able to reproduce something very like this in unpatched master, just by letting recovery pause at a named restore point, and then resuming it. LOG: recovery stopping at restore point "stop", time 2011-05-07 09:28:01.652958-04 LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. (at this point I did pg_xlog_replay_resume()) LOG: redo done at 0/5000020 PANIC: wal receiver still active LOG: startup process (PID 38762) was terminated by signal 6: Abort trap LOG: terminating any other active server processes I'm thinking that this code is wrong: if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) { SetRecoveryPause(true); recoveryPausesHere(); } reachedStopPoint = true; /* see below */ recoveryContinue = false; I think that recoveryContinue = false assignment should not happen if we decide to pause. That is, we should say if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY) { same as now } else recoveryContinue = false. I haven't tested that, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: