Re: VM corruption on standby - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: VM corruption on standby |
Date | |
Msg-id | CAPpHfduZCEBdL-VSN5p4udpn=iws1xhfje0tK-S21q2eR-9VnA@mail.gmail.com Whole thread Raw |
In response to | Re: VM corruption on standby (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: VM corruption on standby
|
List | pgsql-hackers |
On Wed, Sep 3, 2025 at 11:28 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Wed, Sep 3, 2025 at 9:47 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > On 3 Sep 2025, at 11:37, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > > > Could you, please, recheck? > > > > That patch also adds CondVar sleep in critical section. That patch is how we understood that such sleep is dangerous. > > > > Actual patch to deteact a problem is much simpler: > > ``` > > diff --git a/src/backend/storage/ipc/waiteventset.c > > b/src/backend/storage/ipc/waiteventset.c > > index 7c0e66900f9..e89e1d115cb 100644 > > --- a/src/backend/storage/ipc/waiteventset.c > > +++ b/src/backend/storage/ipc/waiteventset.c > > @@ -1044,6 +1044,7 @@ WaitEventSetWait(WaitEventSet *set, long timeout, > > long cur_timeout = -1; > > > > Assert(nevents > 0); > > + Assert(CritSectionCount == 0); > > > > /* > > * Initialize timeout if requested. We must record the current time so > > ``` > > > > Though it will fail in multixact test. > > I thought that patch allows to reproduce the problem of bc22dc0e0d. > Sorry for the noise. The attached is my attempt to rework the bc22dc0e0d. The problem discussed in this thread occurs because condition variable ends the process with proc_exit(1) in the case of postmaster death even in the critical section. However, the shared memory cleanup happening after proc_exit(1) being called in the critical section is leaving the shared memory in the inconsistent state. Generally I see two possible behaviors in this situation. 1) Exit without the shared memory cleanup (e.g. proc_exit(2) as proposed upthread). That would save us from data corruption caused by the inconsistent shared memory state. However, if our process has occupied shared resources (e.g. locks), they wouldn't be released. That might cause other processes hand waiting for those resources. 2) Continue to wait after postmaster death. While bc22dc0e0d was the first case when we waited on conditional variable in the critical section, there are a lot of cases where we're taking LWLock's in the critical section. Unlike condition variable, lwlock will continue to wait in the case of postmaster death. The critical section will continue its operation until completion, postmaster death will be handled afterwards. If we would apply the same logic to the arbitrary condition variable, it might however cause our process to stuck if we're waiting for something from another process, which would exit seeing postmaster death. I think the approach #2 is more appropriate for bc22dc0e0d, because in the critical section we only wait for other processes also in the critical section (so, there is no risk they will exit immediately after postmaster death making us stuck). I've implemented a patch, where waiting on conditional variable is replaced with LWLock-style waiting on semaphore. However, custom waiting code just for AdvanceXLInsertBuffer() doesn't look good. I believe we need some general solution. We might have a special kind of condition variable, a critical section condition variable, where both waiting and signaling must be invoked only in a critical section. However, I dig into our Latch and WaitEventSet, it seems there are too many assumptions about postmaster death. So, a critical section condition variable probably should be implemented on top of semaphore. Any thoughts? ------ Regards, Alexander Korotkov Supabase
Attachment
pgsql-hackers by date: