Should we remove "not fast" promotion at all? - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Should we remove "not fast" promotion at all? |
Date | |
Msg-id | CAHGQGwGYkF+CvpOMdxaO=+aNAzc1Oo9O4LqWo50MxpvFj+0VOw@mail.gmail.com Whole thread Raw |
Responses |
Re: Should we remove "not fast" promotion at all?
Re: Should we remove "not fast" promotion at all? |
List | pgsql-hackers |
Hi all, We discussed the $SUBJECT in the following threads: http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com Our consensus seems to remove "not fast" promotion at all because there is no use case for that promotion. Attached patch removes "not fast" promotion. Barring any objections, I will commit this patch. Regards, On Sat, Aug 3, 2013 at 4:31 PM, Tomonari Katsumata <t.katsumata1122@gmail.com> wrote: > Hi, > > I made a patch for REL9_3_STABLE which gets rid of > old promote processing. please check it. > This patch make PostgreSQL do fast promoting(*) always. > (*) which means skipping long checkpoint before increasing > timeline. > > And after this, I'll do make another patch for unlinking files which are > created by user as a trigger_file or "pg_ctl promote" command. > > --------------- > Tomonari Katsumata > 2013/7/30 Fujii Masao <masao.fujii@gmail.com> >> >> On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata >> <t.katsumata1122@gmail.com> wrote: >> > Hi, >> > >> > >> >>>> Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if >> >>>> both promote files exist. >> >>>> >> >>> The command("unlink(PROMOTE_SIGNAL_FILE)") here is for >> >>> unusualy case. >> >>> Because the case is when done both procedures below. >> >>> - user create "promote" file on PGDATA >> >>> - user issue "pg_ctl promote" >> >>> >> >>> I understand the reason. >> >>> But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before >> >>> unlink(FAST_PROMOTE_SIGNAL_FILE). >> >>> Because FAST_PROMOTE_SIGNAL_FILE is definetly there but >> >>> PROMOTE_SIGNAL_FILE is sometimes there or not there. >> >> >> >> I could not understand why that's better. Could you elaborate that? >> >> >> > I'm sorry for less explanation. >> > >> > I've thought that errno would be set ENOENT and >> > this may lead something wrong. >> > I checked this and I know it's not problem. >> > >> > sorry for confusing you. >> > >> > >> > >> >>> And I have another question linking this behavior. >> >>> I think TriggerFile should be removed too. >> >>> This is corner-case but it will happen. >> >>> How do you think of it ? >> >> >> >> I don't have strong opinion about that. I've never heard the complaint >> >> about that current behavior so far. >> >> >> > For example, please imagine the cascading replication environment and >> > using old master as a standby without copying the timeline history file >> > to new standby. >> > >> > ------- >> > 1. replicating 3 servers(A,B,C) >> > A->B->C >> > ("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and >> > C.) >> > >> > 2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl >> > promote" >> >> Why do you need to both create the trigger file and run pg_ctl promote? >> >> Anyway, if the patch is useful for fail-safe and it doesn't break the >> current >> behavior, I'd be happy to apply it. You are suggesting that we should >> remove >> the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is >> executed. >> But there can be some cases where we can get out of the WAL replay loop, >> for example, reach the recovery_target_xxx. So ISTM we should try to >> remove >> both the trigger file and "promote" file at the end of recovery >> instead. Thought? >> >> > B->C >> > (/tmp/trig file remains on server B) >> > >> > 4. stop server B and promoting server C with "pg_ctl promote" >> > C >> > >> > 5. making server B connect for standby of server C >> > C->B >> > --------- >> > >> > In step5 server B will promote as soon as it starts, >> > because "/tmp/trig" is stil there. >> > >> > >> > >> >>>> One question is that: we really still need to support normal promote? >> >>>> pg_ctl promote provides only way to do fast promotion. If we want to >> >>>> do normal promotion, we need to create PROMOTE_SIGNAL_FILE >> >>>> and send the SIGUSR1 signal to postmaster by hand. This seems messy. >> >>>> >> >>>> I think that we should remove normal promotion at all, or change >> >>>> pg_ctl promote so that provides also the way to do normal promotion. >> >>>> >> >>> I think he merit of "fast promote" is >> >>> - allowing quick connection by skipping checkpoint >> >>> and its demerit is >> >>> - taking little bit longer when crash-recovery >> >>> >> >>> If it is seldom to happen its crash soon after promoting >> >>> and "fast promte" never breaks consistency of database cluster, >> >>> I think we don't need normal promotion. >> >> >> >> You can execute checkpoint after fast promotion for that. >> >> >> > OK. >> > Then I think we should do below things. >> > - removing normal promotion at all from source >> > - adding the know-how you suggest on document >> >> IMO either is necessary. >> >> Regards, >> >> -- >> Fujii Masao > > -- Fujii Masao
Attachment
pgsql-hackers by date: