Re: Incorrect logic in XLogNeedsFlush() - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Incorrect logic in XLogNeedsFlush() |
Date | |
Msg-id | CAAKRu_bK+O56ZtoaZV0pjTK=VffMLyApFaKpX+mmkx8ichBaYg@mail.gmail.com Whole thread Raw |
In response to | Re: Incorrect logic in XLogNeedsFlush() (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Incorrect logic in XLogNeedsFlush()
Re: Incorrect logic in XLogNeedsFlush() |
List | pgsql-hackers |
On Wed, Sep 10, 2025 at 3:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Sep 10, 2025 at 12:48:21PM +0530, Dilip Kumar wrote: > >> So, looking at the code, it seems like most places we examine > >> ControlFile->minRecoveryPoint, we condition it on "InArchiveRecovery". > >> Is this something we want to do in XLogNeedsFlush() to avoid reading > >> from ControlFile->minRecoveryPoint()? > > How would that help when the checkpointer does XLogNeedsFlush() calls, > which is where you are reporting the disturbance is during the > end-of-recovery checkpoint? InArchiveRecovery is only true in the > startup process, set when we are doing archive recovery. This can be > switched from the beginning of recovery or later, once all the local > WAL has been replayed. Ah right, I made the same mistake with the InRecovery global variable initially too. This is my first time reading this code. I thought that there needed to be something guarding processes other than the startup process from reading the minimum recovery point from the ControlFile during crash recovery until it is valid. But perhaps not -- more on that below. > In the case of 015_promotion_pages, I can see that when the check you > are adding fails, the LSN is already flushed by XLogFLush(). So, > while I don't see any active bug here, I tend to agree with your > position that XLogNeedsFlush() could be improved in the context of the > end-of-recovery checkpoint or just when a process is allowed WAL > inserts while we are still in recovery. > > You have a good point here, especially knowing that for > CHECKPOINT_END_OF_RECOVERY case in CreateCheckPoint(), called by the > checkpointer, we allow WAL inserts with LocalSetXLogInsertAllowed(). > So, relying on RecoveryInProgress() in this context leads to a wrong > result with XLogNeedsFlush(): WAL inserts are allowed, the minimum > recovery point is not relevant for the evaluation. So, would you consider the defining characteristic of whether or not we should use the flush pointer instead of min recovery point in XLogNeedsFlush() to be whether or not WAL inserts are allowed? e.g. @@ -3115,7 +3115,7 @@ XLogNeedsFlush(XLogRecPtr record) * instead. So "needs flush" is taken to mean whether minRecoveryPoint * would need to be updated. */ - if (RecoveryInProgress()) + if (RecoveryInProgress() && !XLogInsertAllowed()) If I look at the difference between XLogFlush() and XLogNeedsFlush(), it checks if XLogInsertAllowed() and, if not, does not try to flush WAL -- which makes sense. There shouldn't be WAL to flush if we aren't allowed to insert any. In XLogFlush() -> UpdateMinRecoveryPoint() (called after checking if WAL inserts are allowed), it has similar logic to XLogNeedsFlush(): if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint)) return; if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery) updateMinRecoveryPoint = false; The order of these two if statements is reversed in XLogNeedsFlush() So, what I don't understand is: why it would be okay for processes other than the startup process to read the minimum recovery point from the control file during crash recovery before it is valid. It seems in crash recovery, a backend other than the startup process will fail this test: if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery) and then potentially read from the control file LocalMinRecoveryPoint = ControlFile->minRecoveryPoint; LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI; There is a comment right after that says /* * Check minRecoveryPoint for any other process than the startup * process doing crash recovery, which should not update the control * file value if crash recovery is still running. */ Which seems to say that it is okay for other processes than the startup process to read the minimum recovery point from the Control File during crash recovery. But I don't quite understand why. > Or are you worried about the case of GetVictimBuffer() where a second > call of XLogNeedsFlush() lives? I want it to be fixed because of another patch I am writing to do batched writes with smgrwritev() [1]. In this case, I've split up the code in FlushBuffer() and flush the WAL to the required LSN in a different place in code than where I call smgrwritev(). So, I want to add an assert-only verification that checks that none of the buffer's need WAL flushed. This would be invoked by any process calling FlushBuffer(), so it would have to work for all types of backends. - Melanie [1] https://www.postgresql.org/message-id/flat/CAAKRu_Yurj0sabXZB5EVqO3fKgwN1vELe1H4e0s1zXV3MKbZjQ%40mail.gmail.com#c8caa5ff77d89e75b6f56f16619a77b7
pgsql-hackers by date: