Re: Incorrect logic in XLogNeedsFlush() - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Incorrect logic in XLogNeedsFlush() |
Date | |
Msg-id | CAFiTN-sw1aBxUjKk74m0LG_GGHNat8ZJSyYRMpco48G8hQwMkQ@mail.gmail.com Whole thread Raw |
In response to | Re: Incorrect logic in XLogNeedsFlush() (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Incorrect logic in XLogNeedsFlush()
|
List | pgsql-hackers |
On Mon, Sep 15, 2025 at 8:28 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Sun, Sep 14, 2025 at 4:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > OTOH, when creating a RestartPoint during standby replay or doing > > archive recovery, access to both LocalMinRecoveryPoint and > > ControlFile->minRecoveryPoint is correct. By this stage, it has > > completed the minimum recovery required to achieve a consistent state, > > and ControlFile->minRecoveryPoint would have been properly > > initialized. > > My confusion was two things: how the startup process is getting a > valid minimum recovery point during crash recovery on a standby and > how bgwriter and checkpointer avoid reading the control file during > crash recovery on the standby. > > IIUC, the minimum recovery point on the primary will always be > InvalidXLogRecPtr. Right > On the standby, during "normal" recovery, all backends (regular > backends running read-only queries, checkpointer, etc) update the > minimum recovery point so it is ready in case of a crash. Right > During recovery from a crash on the standby, no regular backends will > be running read-only queries because those are forbidden until crash > recovery finishes. Crash recovery on the standby is finished when WAL > has been replayed through the minimum recovery point -- or, at least I > thought that was the point of the minimum recovery point. > > But, the startup process needs to replay WAL through the minimum > recovery point. So, where does it get this value from? The control > file? In a recovery scenario, the ControlFile->minRecoveryPoint on a standby server is continuously updated. This ensures that even in the event of a crash, a valid recovery point is available. However, if a crashed standalone server is started as a standby, the ControlFile->minRecoveryPoint is initially unset, and this will be set in ReadRecord()->SwitchIntoArchiveRecovery() once all the WAL from pg_wal directory are applied, this happens right before start streaming from primary. > And if the startup process gets the minimum recovery point from the > ControlFile, then how are bgwriter and checkpointer protected from > reading it since they should only see InvalidXLogRecPtr during crash > recovery? > > Looking at this logic in XLogFlush() -> UpdateMinRecoveryPoint() > > if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery) > { > updateMinRecoveryPoint = false; > return; > } > > Only the startup process will pass this test. Then other processes > will continue to read the control file. This only works if > ControlFile->minRecoveryPoint is InvalidXLogRecPtr during crash > recovery. Until the startup process hasn't completed the crash recovery i.g. not applied the WAL unto 'ControlFile->minRecoveryPoint' logically 'bgwriter' can not write any buffer whose WAL location is beyond the ControlFile->minRecoveryPoint because we haven't yet applied those WALs and also there should not any RestartPoint()/Checkpoint record before reaching the ControlFile->minRecoveryPoint otherwise we would have started crash recovery from that point. During the crash recovery process, before the WAL has been fully applied up to the ControlFile->minRecoveryPoint, the bgwriter cannot write any buffers that have a page LSN beyond this min recovery point. This is because those WALs which are beyond min recovery point have not yet been applied, so any buffer can not have page_lsn beyond this point. Therefore, I agree that when UpdateMinRecoveryPoint() is invoked by the bgwriter during standby crash recovery, it will indeed retrieve a valid value from ControlFile->minRecoveryPoint. However, IMHO, this cannot be called with LSN greater than ControlFile->minRecoveryPoint itself. Consequently, the conditional check else if (force || LocalMinRecoveryPoint < lsn) will not evaluate to true, and no action will be taken. And as far as the checkpointer is concerned, on standby systems, we only execute RestartPoint when a Checkpoint WAL is detected. And logically, no Checkpoint WAL should exist between the recovery start point and minRecoveryPoint, as crash recovery initiates from the most recent checkpoint redo. -- Regards, Dilip Kumar Google
pgsql-hackers by date: