As a start, are these failures only in the startup process? Has the startup process reached a consistent state when the problem happens because the replay code is too eager at removing the stats entries? Has it not reached a consistent state. These could be useful hints to extract a reproducible test case, looking for common patterns.
The pattern is the same, although I am not 100% sure that the same replica is having this - it is a cascaded streaming replica, i.e. a replica of another replica. Once we had this in October 2024, with version 15.4, then in August 2025 with 17.3, and now in September again (17.3). The database is working for month(s) perfectly fine in a heavy production workload (lots of WALs etc.), and then all of a sudden it shuts down.
Thanks for the feedback, and let me know if I could provide any additional info.