Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | 54d5571a-e15f-6360-2762-fde7c5c126c8@oss.nttdata.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On 2020/03/09 14:10, Masahiko Sawada wrote: > On Mon, 9 Mar 2020 at 13:24, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/03/08 13:52, Masahiko Sawada wrote: >>> On Thu, 5 Mar 2020 at 20:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2020/03/05 16:58, Masahiko Sawada wrote: >>>>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/03/04 14:31, Masahiko Sawada wrote: >>>>>>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2020/03/04 13:27, Michael Paquier wrote: >>>>>>>>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote: >>>>>>>>>> Yeah, so 0001 patch sets existing wait events to recovery conflict >>>>>>>>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION) >>>>>>>>>> to the recovery conflict on a snapshot. 0003 patch improves these wait >>>>>>>>>> events by adding the new type of wait event such as >>>>>>>>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch >>>>>>>>>> is the fix for existing versions and 0003 patch is an improvement for >>>>>>>>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching? >>>>>>>> >>>>>>>> Yes, it looks like a improvement rather than bug fix. >>>>>>>> >>>>>>> >>>>>>> Okay, understand. >>>>>>> >>>>>>>>> I got my eyes on this patch set. The full patch set is in my opinion >>>>>>>>> just a set of improvements, and not bug fixes, so I would refrain from >>>>>>>>> back-backpatching. >>>>>>>> >>>>>>>> I think that the issue (i.e., "waiting" is reported twice needlessly >>>>>>>> in PS display) that 0002 patch tries to fix is a bug. So it should be >>>>>>>> fixed even in the back branches. >>>>>>> >>>>>>> So we need only two patches: one fixes process title issue and another >>>>>>> improve wait event. I've attached updated patches. >>>>>> >>>>>> Thanks for updating the patches! I started reading 0001 patch. >>>>> >>>>> Thank you for reviewing this patch. >>>>> >>>>>> >>>>>> - /* >>>>>> - * Report via ps if we have been waiting for more than 500 msec >>>>>> - * (should that be configurable?) >>>>>> - */ >>>>>> - if (update_process_title && new_status == NULL && >>>>>> - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), >>>>>> - 500)) >>>>>> >>>>>> The patch changes ResolveRecoveryConflictWithSnapshot() and >>>>>> ResolveRecoveryConflictWithTablespace() so that they always add >>>>>> "waiting" into the PS display, whether wait is really necessary or not. >>>>>> But isn't it better to display "waiting" in PS basically when wait is >>>>>> necessary, like originally ResolveRecoveryConflictWithVirtualXIDs() >>>>>> does as the above? >>>>> >>>>> You're right. Will fix it. >>>>> >>>>>> >>>>>> ResolveRecoveryConflictWithDatabase(Oid dbid) >>>>>> { >>>>>> + char *new_status = NULL; >>>>>> + >>>>>> + /* Report via ps we are waiting */ >>>>>> + new_status = set_process_title_waiting(); >>>>>> >>>>>> In ResolveRecoveryConflictWithDatabase(), there seems no need to >>>>>> display "waiting" in PS because no wait occurs when recovery conflict >>>>>> with database happens. >>>>> >>>>> Isn't the startup process waiting for other backend to terminate? >>>> >>>> Yeah, you're right. I agree that "waiting" should be reported in this case. >>>> >>>> Currently ResolveRecoveryConflictWithLock() and >>>> ResolveRecoveryConflictWithBufferPin() don't call >>>> ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting" >>>> in PS display. You changed them so that they report "waiting". I agree >>>> to have this change. But this change is an improvement rather than >>>> a bug fix, i.e., we should apply this change only in v13? >>>> >>> >>> Did you mean ResolveRecoveryConflictWithDatabase and >>> ResolveRecoveryConflictWithBufferPin? >> >> Yes! Sorry for my typo. >> >>> In the current code as far as I >>> researched there are two cases where we don't add "waiting" and one >>> case where we doubly add "waiting". >>> >>> ResolveRecoveryConflictWithDatabase and >>> ResolveRecoveryConflictWithBufferPin don't update the ps title. >>> Although the path where GetCurrentTimestamp() >= ltime is false in >>> ResolveRecoveryConflictWithLock also doesn't update the ps title, it's >>> already updated in WaitOnLock. On the other hand, the path where >>> GetCurrentTimestamp() >= ltime is true in >>> ResolveRecoveryConflictWithLock updates the ps title but it's wrong as >>> I reported. >>> >>> I've split the patch into two patches: 0001 patch fixes the issue >>> about doubly updating ps title, 0002 patch makes the recovery conflict >>> resolution on database and buffer pin update the ps title. >> >> Thanks for splitting the patches. I think that 0001 patch can be back-patched. >> >> - /* >> - * Report via ps if we have been waiting for more than 500 msec >> - * (should that be configurable?) >> - */ >> - if (update_process_title && new_status == NULL && >> - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), >> - 500)) >> >> Originally, "waiting" is reported in PS if we've been waiting for more than >> 500 msec, as the above does. But you got rid of those codes in the patch. >> Did you confirm that it's safe to do that? If not, isn't it better to apply >> the attached patch? > > In WaitOnLock() we update the ps title regardless of waiting time. So > I thought we can change it to make these behavior consistent. But > considering back-patch, your patch looks better than mine. Yeah, so I pushed the 0001 patch at first! I will review the other patches later. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
pgsql-hackers by date: