Re: Incorrect logic in XLogNeedsFlush() - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Incorrect logic in XLogNeedsFlush()
Date
Msg-id CAFiTN-sw1aBxUjKk74m0LG_GGHNat8ZJSyYRMpco48G8hQwMkQ@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect logic in XLogNeedsFlush()  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Incorrect logic in XLogNeedsFlush()
List pgsql-hackers
On Mon, Sep 15, 2025 at 8:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 4:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > OTOH, when creating a RestartPoint during standby replay or doing
> > archive recovery, access to both LocalMinRecoveryPoint and
> > ControlFile->minRecoveryPoint is correct. By this stage, it has
> > completed the minimum recovery required to achieve a consistent state,
> > and ControlFile->minRecoveryPoint would have been properly
> > initialized.
>
> My confusion was two things: how the startup process is getting a
> valid minimum recovery point during crash recovery on a standby and
> how bgwriter and checkpointer avoid reading the control file during
> crash recovery on the standby.
>
> IIUC, the minimum recovery point on the primary will always be
> InvalidXLogRecPtr.

Right

> On the standby, during "normal" recovery, all backends (regular
> backends running read-only queries, checkpointer, etc) update the
> minimum recovery point so it is ready in case of a crash.

Right

> During recovery from a crash on the standby, no regular backends will
> be running read-only queries because those are forbidden until crash
> recovery finishes. Crash recovery on the standby is finished when WAL
> has been replayed through the minimum recovery point -- or, at least I
> thought that was the point of the minimum recovery point.
>
> But, the startup process needs to replay WAL through the minimum
> recovery point. So, where does it get this value from? The control
> file?

In a recovery scenario, the ControlFile->minRecoveryPoint on a standby
server is continuously updated. This ensures that even in the event of
a crash, a valid recovery point is available. However, if a crashed
standalone server is started as a standby, the
ControlFile->minRecoveryPoint is initially unset, and this will be set
in ReadRecord()->SwitchIntoArchiveRecovery() once all the WAL from
pg_wal directory are applied, this happens right before start
streaming from primary.

> And if the startup process gets the minimum recovery point from the
> ControlFile, then how are bgwriter and checkpointer protected from
> reading it since they should only see InvalidXLogRecPtr during crash
> recovery?
>
> Looking at this logic in XLogFlush() -> UpdateMinRecoveryPoint()
>
>     if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
>     {
>         updateMinRecoveryPoint = false;
>         return;
>     }
>
> Only the startup process will pass this test. Then other processes
> will continue to read the control file. This only works if
> ControlFile->minRecoveryPoint is InvalidXLogRecPtr during crash
> recovery.

Until the startup process hasn't completed the crash recovery i.g. not
applied the WAL unto 'ControlFile->minRecoveryPoint' logically
'bgwriter' can not write any buffer whose WAL location is beyond the
ControlFile->minRecoveryPoint because we haven't yet applied those
WALs and also there should not any RestartPoint()/Checkpoint record
before reaching the ControlFile->minRecoveryPoint otherwise we would
have started crash recovery from that point.

During the crash recovery process, before the WAL has been fully
applied up to the ControlFile->minRecoveryPoint, the bgwriter cannot
write any buffers that have a page LSN beyond this min recovery point.
This is because those WALs which are beyond min recovery point have
not yet been applied, so any buffer can not have page_lsn beyond this
point.

Therefore, I agree that when UpdateMinRecoveryPoint() is invoked by
the bgwriter during standby crash recovery, it will indeed retrieve a
valid value from ControlFile->minRecoveryPoint. However, IMHO, this
cannot be called with LSN greater than ControlFile->minRecoveryPoint
itself. Consequently, the conditional check else if (force ||
LocalMinRecoveryPoint < lsn) will not evaluate to true, and no action
will be taken.

And as far as the checkpointer is concerned, on standby systems, we
only execute RestartPoint when a Checkpoint WAL is detected. And
logically, no Checkpoint WAL should exist between the recovery start
point and minRecoveryPoint, as crash recovery initiates from the most
recent checkpoint redo.

--
Regards,
Dilip Kumar
Google



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgbench: remove an unused function argument
Next
From: Mircea Cadariu
Date:
Subject: Re: [BUG] temporary file usage report with extended protocol and unnamed portals