Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: corruption of WAL page header is never reported |
| Date | |
| Msg-id | 20210719.151441.1342311546952131179.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | corruption of WAL page header is never reported (Yugo NAGATA <nagata@sraoss.co.jp>) |
| Responses |
Re: corruption of WAL page header is never reported
|
| List | pgsql-hackers |
Hello.
At Sun, 18 Jul 2021 04:55:05 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in
> Hello,
>
> I found that any corruption of WAL page header found during recovery is never
> reported in log messages. If wal page header is broken, it is detected in
> XLogReaderValidatePageHeader called from XLogPageRead, but the error messages
> are always reset and never reported.
Good catch! Currently recovery stops showing no reason if it is
stopped by page-header errors.
> I attached a patch to fix it in this way.
However, it is a kind of a roof-over-a-roof. What we should do is
just omitting the check in XLogPageRead while in standby mode.
> Or, if we wouldn't like to report an error for each check and also what we want
> to check here is just about old recycled WAL instead of header corruption itself,
> I wander that we could check just xlp_pageaddr instead of calling
> XLogReaderValidatePageHeader.
I'm not sure. But as described in the commit message, the commit
intended to save a common case and there's no obvious reason to (and
not to) restrict the check only to page address. So it uses the
established checking function.
I was tempted to adjust the comment just above by adding "while in
standby mode", but "so that we can retry immediately" is suggesting
that so I didn't do that in the attached.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From 30033d810bcc784da55600792484603e1c46b3d7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 14:49:34 +0900
Subject: [PATCH v1] Don't forget message of hage-header errors while not in
standby mode
The commit 0668719801 intended to omit page-header errors only while
in standby mode but actually it is always forgotten. As the result
the message of the end of a crash recovery lacks the reason for the
stop. Fix that by doing the additional check only while in standby
mode.
---
src/backend/access/transam/xlog.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ee9515139..79513fb8b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12317,7 +12317,8 @@ retry:
* Validating the page header is cheap enough that doing it twice
* shouldn't be a big deal from a performance point of view.
*/
- if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+ if (StandbyMode &&
+ !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
{
/* reset any error XLogReaderValidatePageHeader() might have set */
xlogreader->errormsg_buf[0] = '\0';
--
2.27.0
pgsql-hackers by date: