Re: VM corruption on standby - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: VM corruption on standby
Date
Msg-id CALdSSPhLQvTd+6=reYAiCPiPVAWEdyNHuLnWsRq8PXBGE97bLw@mail.gmail.com
Whole thread Raw
In response to Re: VM corruption on standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: VM corruption on standby
List pgsql-hackers
Hi! Thank you for putting attention to this.

On Sun, 17 Aug 2025 at 19:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kirill Reshke <reshkekirill@gmail.com> writes:
> > [ v1-0001-Do-not-exit-on-postmaster-death-ever-inside-CRIT-.patch ]
>
> I do not like this patch one bit: it will replace one set of problems
> with another set, namely systems that fail to shut down.

I did not observe this during my by-hand testing. I am under the
impression that CRIT sections are something that backend (or other)
postgres processes try to pass quickly. So, what this patch is doing,
is that it defers the process reaction to postmaster death until the
end of the CRIT section.
So, typical scenario here (as I understand) is this:

(1) Process doing its goods, enters CRIT section.
(2) Postmaster dies.
(3a) Signal of postmaster death (SIGPWR on my VM) delivered to process
(3b) Process exists CRIT sections, and then does CFI logic, observes
postmaster death and quits.

This is why I did my patch the way I did it. I mean, is it always
possible for race conditions to occur, which will result in late
signal delivery, so why bother and all?

> I think the actual bug here is the use of proc_exit(1) after
> observing postmaster death.

Agreed.


> So I think the correct fix here is s/proc_exit(1)/_exit(2)/ in the
> places that are responding to postmaster death.  There might be
> more than just WaitEventSetWaitBlock; I didn't look.

Well, I see that patching this way will be a much safer way to fix the
issue.  I can see that doing more conservative changes can be more
beneficial (more future-proof and less bug-prone).
I will take a detailed look and try to send a patch soon.

-- 
Best regards,
Kirill Reshke



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: test_ddl_deparse: Rename test create_sequence_1
Next
From: Kirill Reshke
Date:
Subject: Re: ALTER DOMAIN ADD NOT NULL NOT VALID