Re: Is Recovery actually paused? - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Is Recovery actually paused? |
Date | |
Msg-id | CALj2ACWGDfba54ANjWUU=uA7w_HbCh0LOYh5z7-Tz00UxH4OyA@mail.gmail.com Whole thread Raw |
In response to | Re: Is Recovery actually paused? (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Is Recovery actually paused?
|
List | pgsql-hackers |
On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in > > > > > > > > > > I think the right fix should be that the state should never go from > > > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > > > > > > > > > care of that. > > > > > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > > > > > > > it can not handle the case that a user resume and pause again while a sleep. > > > > > > > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > > > > allow the state to go from paused to pause requested irrespective of > > > > > > > > this. > > > > > > > > > > > > > > I agree with you. > > > > > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > > > observe 'pause requested' during a sleep alghough the time window is short. > > > > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > > > > because This state meeans that recovery may not be 'paused'. > > > > > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > > > requested'. the logical state transition should always be as below > > > > > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > > > request and then paused but there is nothing wrong with going to > > > > paused) > > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused) > > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > > > PAUSE_REQUESTED without going to NOT PAUSED) > > > > > > I didn't asked about the internal logical correctness, but asked about > > > *actual harm* revealed to users. I don't see any actual harm in the > > > "wrong" transition because: > > > > Actually, the incorrect state transition is not so harmful except that > > users can observe unnecessary state changes. However, I don't think any > > actual harm in prohibit the incorrect state transition. So, I think we > > can do it. > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > to introduce interlocking between the recovery side and the > > > pause-requestor side instead of introducing the intermediate state, > > > which is the cause of the complexity. > > > > > > The attached PoC patch adds: > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > info_lck since the check is done in the existing lock section. > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > shared variable. > > > (This is what I called "synchronous" before.) > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > also introduce other complexity to codes such as possibility of waiting for > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > > > > I agree with this, I think we previously discussed these approaches > where we can wait in pg_wal_replay_pasue() or > pg_is_wal_replay_pasued(). In fact, we had an older version where we > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing > so will add extra complexity as well as instead of waiting in these > APIs the wait logic can be implemented in the application code which > is actually using these APIs and IMHO that will give better control to > the users. And also, having waiting logic in pg_wal_replay_pasue() or pg_is_wal_replay_pasued() required changes to the existing API such as a timeout to not allow them infinitely waiting. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: