Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | CA+fd4k7Ky6i=mQQpyybdB_3gnmAFdxiHpRgQAHgdvauvbyXffQ@mail.gmail.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On Wed, 1 Apr 2020 at 22:32, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/30 20:10, Masahiko Sawada wrote: > > On Fri, 27 Mar 2020 at 17:54, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/03/04 14:31, Masahiko Sawada wrote: > >>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/03/04 13:27, Michael Paquier wrote: > >>>>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote: > >>>>>> Yeah, so 0001 patch sets existing wait events to recovery conflict > >>>>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION) > >>>>>> to the recovery conflict on a snapshot. 0003 patch improves these wait > >>>>>> events by adding the new type of wait event such as > >>>>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch > >>>>>> is the fix for existing versions and 0003 patch is an improvement for > >>>>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching? > >>>> > >>>> Yes, it looks like a improvement rather than bug fix. > >>>> > >>> > >>> Okay, understand. > >>> > >>>>> I got my eyes on this patch set. The full patch set is in my opinion > >>>>> just a set of improvements, and not bug fixes, so I would refrain from > >>>>> back-backpatching. > >>>> > >>>> I think that the issue (i.e., "waiting" is reported twice needlessly > >>>> in PS display) that 0002 patch tries to fix is a bug. So it should be > >>>> fixed even in the back branches. > >>> > >>> So we need only two patches: one fixes process title issue and another > >>> improve wait event. I've attached updated patches. > >> > >> I started reading v2-0002-Improve-wait-events-for-recovery-conflict-resolut.patch. > >> > >> - ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > >> + ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_BUFFER_PIN); > >> > >> Currently the wait event indicating the wait for buffer pin has already > >> been reported. But the above change in the patch changes the name of > >> wait event for buffer pin only in the startup process. Is this really useful? > >> Isn't the existing wait event for buffer pin enough? > >> > >> - /* Wait to be signaled by the release of the Relation Lock */ > >> - ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type); > >> + /* Wait to be signaled by the release of the Relation Lock */ > >> + ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_LOCK); > >> > >> Same as above. Isn't the existing wait event enough? > > > > Yeah, we can use the existing wait events for buffer pin and lock. > > > >> > >> - /* > >> - * Progressively increase the sleep times, but not to more than 1s, since > >> - * pg_usleep isn't interruptible on some platforms. > >> - */ > >> - standbyWait_us *= 2; > >> - if (standbyWait_us > 1000000) > >> - standbyWait_us = 1000000; > >> + WaitLatch(MyLatch, > >> + WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT, > >> + STANDBY_WAIT_MS, > >> + wait_event_info); > >> + ResetLatch(MyLatch); > >> > >> ResetLatch() should be called before WaitLatch()? > > > > Fixed. > > > >> > >> Could you tell me why you dropped the "increase-sleep-times" logic? > > > > I thought we can remove it because WaitLatch is interruptible but my > > observation was not correct. The waiting startup process is not > > necessarily woken up by signal. I think it's still better to not wait > > more than 1 sec even if it's an interruptible wait. > > So we don't need to use WaitLatch() there, i.e., it's ok to keep using > pg_usleep()? > > > Attached patch fixes the above and introduces only two wait events of > > conflict resolution: snapshot and tablespace. > > Many thanks for updating the patch! > > - /* Wait to be signaled by the release of the Relation Lock */ > - ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type); > + /* Wait to be signaled by the release of the Relation Lock */ > + ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type); > + } > > Is this change really valid? What happens if the latch is set during > ResolveRecoveryConflictWithVirtualXIDs()? > ResolveRecoveryConflictWithVirtualXIDs() can return after the latch > is set but before WaitLatch() in WaitExceedsMaxStandbyDelay() is reached. Thank you for reviewing the patch! You're right. It's better to keep using pg_usleep() and set the wait event by pgstat_report_wait_start(). > > + default: > + event_name = "unknown wait event"; > + break; > > Seems this default case should be removed. Please see other > pgstat_get_wait_xxx() function, so there is no such code. > > > I also removed the wait > > event of conflict resolution of database since it's unlikely to become > > a user-visible and a long sleep as we discussed before. > > Is it worth defining new wait event type RecoveryConflict only for > those two events? ISTM that it's ok to use IPC event type here. > I dropped a new wait even type and added them to IPC wait event type. I've attached the new version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: