Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date | |
Msg-id | CALDaNm1u=QV1a7w48BWgkg6GnK20eE_y+raL43D46_R9Wnt8wg@mail.gmail.com Whole thread Raw |
In response to | Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
List | pgsql-hackers |
On Sun, 29 Jun 2025 at 11:52, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > Thanks everyone who are working on the bug. IIUC the remained task is > to add code comments for avoiding the same mistake again described here: > > > Sounds reasonable. As per analysis till now, it seems removal of new > > assert is correct and we just need to figure out the reason in all > > failure cases as to why the physical slot's restart_lsn goes backward, > > and then add a comment somewhere to ensure that we don't repeat a > > similar mistake in the future. > > I've wrote a draft for that. How do you think? I analyzed a scenario involving physical replication where the restart_lsn appears to go backward by fewer files: In my case, current restart_lsn = 0/A721E68, which corresponds to a WAL record in the file 00000001000000000000000A. However, the recovery process is progressing slowly and is currently at 0/3058078. Now when the primary_conninfo is changed the walsender/walreceiver process will be restarted. As a result, the standby starts requesting WAL streaming from 0/3058078 the last applied wal lsn, which is in the file 000000010000000000000003. Since this WAL segment (000000010000000000000003) has already been removed due to the restart_lsn being far ahead, the primary throws an error saying WAL segment is removed. Even though the apply pointer is at 0/3058078 (the record currently being applied), the standby already has WAL segments up to 0/A721E68 written and available locally. So, despite the walsender process failing to start due to missing segments, the recovery process on the standby continues applying WAL up to 0/A721E68 using the locally available files (including 00000001000000000000000A). Once all local WAL has been applied, the standby will attempt to start streaming again—this time using the latest WAL record (in my case, 0/A74000). Since this record still exists on the primary, the walsender process starts successfully, and physical replication resumes normally. This confirms that even in physical replication, a backward-moving restart_lsn does not cause issues as the standby retains the necessary WAL locally and can catch up without requiring removed segments from the primary. I was able to reproduce this scenario using the physical_replication_restart_lsn_backward patch attached, here I have changed to increase the checkpoint to happen frequently and slowed the recovery process to apply the wal records with delays. For me I was able to reproduce this on every run. The logs for the same are attached in the logs.zip file and test used to reproduce this is available at test_restart_lsn_backward.zip file: Important log information from the attached logs: Restart_lsn has moved to 0/A6E8000 in the primary node: 2025-07-02 22:34:44.563 IST walsender[330205] standby1 LOG: PhysicalConfirmReceivedLocation replication slot "sb1_slot" set restart_lsn to 0/A6E8000 2025-07-02 22:34:44.563 IST walsender[330205] standby1 STATEMENT: START_REPLICATION SLOT "sb1_slot" 0/3000000 TIMELINE 1 When the walsender is restarted, it will request a older WAL which has been removed and throw an error: 2025-07-02 22:34:44.577 IST walsender[330428] standby1 STATEMENT: START_REPLICATION SLOT "sb1_slot" 0/3000000 TIMELINE 1 2025-07-02 22:34:44.577 IST walsender[330428] standby1 ERROR: requested WAL segment 000000010000000000000003 has already been removed Standby node has written WAL upto 0/A6E8000: 2025-07-02 22:34:44.563 IST walreceiver[330204] LOG: XLogWalRcvFlush LogstreamResult.Flush initialized to 0/A6E8000 2025-07-02 22:34:44.563 IST walreceiver[330204] LOG: sending write 0/A6E8000 flush 0/A6E8000 apply 0/ Walreceiver process terminated: 2025-07-02 22:34:44.564 IST walreceiver[330204] FATAL: terminating walreceiver process due to administrator command Walreceiver requests wal from 000000010000000000000003 WAL file which is removed: 2025-07-02 22:34:44.577 IST walreceiver[330427] LOG: started streaming WAL from primary at 0/3000000 on timeline 1 2025-07-02 22:34:44.577 IST walreceiver[330427] LOG: WalReceiverMain LogstreamResult.Flush initialized to 0/3047958 2025-07-02 22:34:44.577 IST walreceiver[330427] LOG: sending write 0/3047958 flush 0/3047958 apply 0/3047958 2025-07-02 22:34:44.577 IST walreceiver[330427] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed Even though the WAL file is removed there is no issue, recovery process applies the WAL from locally available WAL file: 2025-07-02 22:34:44.578 IST startup[330195] LOG: Applied wal record end rec 0/3047990, read rec 0/3047958 2025-07-02 22:34:44.580 IST checkpointer[330193] LOG: recovery restart point at 0/3047938 Once all the available WAL's are applied the walsender is started again and replication proceeds smoothly: 2025-07-02 22:35:15.002 IST walsender[330922] standby1 LOG: received replication command: START_REPLICATION SLOT "sb1_slot" 0/A000000 TIMELINE 1 2025-07-02 22:35:15.002 IST walsender[330922] standby1 STATEMENT: START_REPLICATION SLOT "sb1_slot" 0/A000000 TIMELINE 1 I'm ok with adding the comments. Regards, Vignesh
Attachment
pgsql-hackers by date: