Home > mailing lists

Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From	Drouvot, Bertrand
Subject	Re: Add Information during standby recovery conflicts
Date	November 25, 2020 15:47:44
Msg-id	48a1c817-cbde-33c4-2672-13ef37e1d946@amazon.com Whole thread Raw
In response to	Re: Add Information during standby recovery conflicts (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: Add Information during standby recovery conflicts
List	pgsql-hackers

Tree view

Hi,

On 11/25/20 2:20 PM, Masahiko Sawada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> Hi,
>>
>> On 11/17/20 4:44 PM, Fujii Masao wrote:
>>> Thanks for updating the patch! Here are review comments.
>>>
>>> +        Controls whether a log message is produced when the startup
>>> process
>>> +        is waiting longer than <varname>deadlock_timeout</varname>
>>> +        for recovery conflicts.
>>>
>>> But a log message can be produced also when the backend is waiting
>>> for recovery conflict. Right? If yes, this description needs to be
>>> corrected.
>> Thanks for looking at it!
>>
>> I don't think so, only the startup process should write those new log
>> messages.
>>
>> What makes you think that would not be the case?
>>
>>>
>>> +        for recovery conflicts.  This is useful in determining if
>>> recovery
>>> +        conflicts prevents the recovery from applying WAL.
>>>
>>> "prevents" should be "prevent"?
>> Indeed: fixed in the new attached patch.
>>
>>>
>>> +       TimestampDifference(waitStart, GetCurrentTimestamp(), &secs,
>>> &usecs);
>>> +       msecs = secs * 1000 + usecs / 1000;
>>>
>>> GetCurrentTimestamp() is basically called before LogRecoveryConflict()
>>> is called. So isn't it better to avoid calling GetCurrentTimestamp()
>>> newly in
>>> LogRecoveryConflict() and to reuse the timestamp that we got?
>>> It's helpful to avoid the waste of cycles.
>>>
>> good catch! fixed in the new attached patch.
>>
>>> +               while (VirtualTransactionIdIsValid(*vxids))
>>> +               {
>>> +                       PGPROC *proc =
>>> BackendIdGetProc(vxids->backendId);
>>>
>>> BackendIdGetProc() can return NULL if the backend is not active
>>> at that moment. This case should be handled.
>>>
>> handled in the new attached patch.
>>> +               case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
>>> +                       reasonDesc = gettext_noop("recovery is still
>>> waiting recovery conflict on buffer pin");
>>>
>>> It's natural to use "waiting for recovery" rather than "waiting
>>> recovery"?
>>>
>> I would be tempted to say so, the new patch makes use of "waiting for".
>>> +               /* Also, set deadlock timeout for logging purpose if
>>> necessary */
>>> +               if (log_recovery_conflict_waits)
>>> +               {
>>> +                       timeouts[cnt].id = STANDBY_TIMEOUT;
>>> +                       timeouts[cnt].type = TMPARAM_AFTER;
>>> +                       timeouts[cnt].delay_ms = DeadlockTimeout;
>>> +                       cnt++;
>>> +               }
>>>
>>> This needs to be executed only when the message has not been logged yet.
>>> Right?
>>>
>> good catch: fixed in the new attached patch.
>>
> Thank you for updating the patch! Here are review comments on the
> latest version patch.
>
> +       while (VirtualTransactionIdIsValid(*vxids))
> +       {
> +           PGPROC *proc = BackendIdGetProc(vxids->backendId);
> +
> +           if (proc)
> +           {
> +               if (nprocs == 0)
> +                   appendStringInfo(&buf, "%d", proc->pid);
> +               else
> +                   appendStringInfo(&buf, ", %d", proc->pid);
> +
> +               nprocs++;
> +               vxids++;
> +           }
> +       }
>
> We need to increment vxids even if *proc is null. Otherwise, the loop won't end.

My bad, that's fixed.

>
> ---
> +               TimestampTz cur_ts = GetCurrentTimestamp();;
Fixed
>
> There is an extra semi-colon.
>
> ---
>   int            max_standby_streaming_delay = 30 * 1000;
> +bool       log_recovery_conflict_waits = false;
> +bool       logged_lock_conflict = false;
>
>
> +       if (log_recovery_conflict_waits && !logged_lock_conflict)
> +       {
> +           timeouts[cnt].id = STANDBY_TIMEOUT;
> +           timeouts[cnt].type = TMPARAM_AFTER;
> +           timeouts[cnt].delay_ms = DeadlockTimeout;
> +           cnt++;
> +       }
>
> Can we pass a bool indicating if a timeout may be needed for recovery
> conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock()
> instead of using a static variable?

Yeah that makes more sense, specially as we already have 
logged_recovery_conflict at our disposal.

New patch version attached.

Thanks!

Bertrand

Attachment

v9-0003-Log-the-standby-recovery-conflict-waits.patch

pgsql-hackers by date:

From: Anastasia Lubennikova
Date: 25 November 2020, 15:34:48
Subject: Re: PoC: custom signal handler for extensions

From: Stephen Frost
Date: 25 November 2020, 15:51:44
Subject: Re: A few new options for CHECKPOINT

Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

Attachment

Previous

Next