Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs
From | Noah Misch |
---|---|
Subject | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date | |
Msg-id | 20230811025808.GA2261232@rfd.leadboat.com Whole thread Raw |
In response to | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
List | pgsql-bugs |
On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote: > On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote: > > On my system, testing unpatched REL_15_2, the test reached its first failure > > at minute 53 and its second failure at minute 189. Perhaps a faster, > > deterministic test would be feasible: > > It took less than five minutes for me to reproduce the failure using > the test case sent by Alexander on HEAD: > 2023-08-10 15:26:37.401 JST [11239] FATAL: invalid memory alloc request size 2021163525 > 2023-08-10 15:26:37.405 JST [11235] LOG: startup process (PID 11239) exited with exit code 1 > > I don't see it after two hours of tests with the patch. That is good. > > - Use pg_logical_emit_message to fill a few segments with 0xFF. > > - CHECKPOINT the primary, so the standby recycles segments. > > - One more pg_logical_emit_message, computing the length from > > pg_current_wal_insert_lsn() such that new message crosses a segment boundary > > and ends 4 bytes before the end of a page. > > - Stop the primary. > > - If the bug is present, the standby will exit. > > Good idea to pollute the data with recycled segments. Using a minimal > WAL segment size whould help here as well in keeping a test cheap, and > two segments should be enough. The alignment calculations and the > header size can be known, but the standby records are an issue for the > predictability of the test when it comes to adjust the length of the > logical message depending on the 8k WAL page, no? Could be. I expect there would be challenges translating that outline into a real test, but I don't know if that will be a major one. The test doesn't need to be 100% deterministic. If it fails 25% of the time and is not the slowest test in the recovery suite, I'd find that good enough. > >>>> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de > >>> > >>> Regarding [1], is it still worth zeroing recycled pages on standbys and/or > >>> reading the whole header before allocating xl_tot_len? (Are there benefits > >>> other than avoiding a 1G backend allocation or 4G frontend allocation, or is > >>> that benefit worth the cycles?) > >> > >> I believe reading the whole header is the most sensible approach as it > >> can prevent unnecessary memory requests. Another potential solution > >> (or hack) for this specific case is to let XLogWalRcvFlush write a > >> finalizing ((uint32)0) when dying is true. This stabilizes the > >> behavior to "invalid record length.. got 0" when running the TAP test. > > FWIW, I came back to this thread while tweaking the error reporting of > xlogreader.c for the sake of this thread and this proposal is an > improvement to be able to make a distinction between an OOM and an > incorrect record: > https://www.postgresql.org/message-id/ZMh/WV+CuknqePQQ@paquier.xyz > > Anyway, agreed that it's an improvement to remove this check out of > allocate_recordbuf(). Noah, are you planning to work more on that? I can push xl_tot_len-validate-v1.patch, particularly given the testing result you reported today. I'm content for my part to stop there. > >> --- a/src/backend/replication/walreceiver.c > >> +++ b/src/backend/replication/walreceiver.c > >> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli) > >> { > >> Assert(tli != 0); > >> > >> + if (dying) > >> + { > >> + int buflen = sizeof(uint32); > >> + char buf[sizeof(uint32)]; > >> + > >> + memset(buf, 0, buflen); > >> + XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli); > >> + } > >> + > >> if (LogstreamResult.Flush < LogstreamResult.Write) > >> { > >> WalRcvData *walrcv = WalRcv; > > > > That's inexpensive in cycles, and it should make the failures in question > > quite rare. If we do that instead of pre-zeroing, I think we'd still want to > > be ready for invalid lengths, for the case of unclean standby shutdown. > > Tweaking the WAL receiver to do more zeroing has been discussed in the > past, but as far as I recall we were not completely sure whether it's > completely free, either. It seems to me that this should have a > discussion on its own, in a new thread. Not sure if we should worry > about archiving, actually, even if the segments should be padded with > zeros beforehand. Okay. > + gotheader = targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord ? true : false; Okay.
pgsql-bugs by date: