Re: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works. - Mailing list pgsql-hackers
| From | Fujii Masao |
|---|---|
| Subject | Re: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works. |
| Date | |
| Msg-id | 10a11134-c00e-26c7-bacd-d8c5bce99207@oss.nttdata.com Whole thread Raw |
| In response to | Re: [Bug fix]There is the case archive_timeout parameter isignored after recovery works. (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
| Responses |
RE: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works.
|
| List | pgsql-hackers |
On 2020/06/29 16:41, Kyotaro Horiguchi wrote:
> Hello.
>
> At Mon, 29 Jun 2020 04:35:11 +0000, "higuchi.daisuke@fujitsu.com" <higuchi.daisuke@fujitsu.com> wrote in
>> Hi,
>>
>> I found the bug about archive_timeout parameter.
>> There is the case archive_timeout parameter is ignored after recovery works.
> ...
>> [Problem]
>> When the value of archive_timeout is smaller than that of checkpoint_timeout and recovery works, archive_timeout is
ignoredin the first WAL archiving.
>> Once WAL is archived, the archive_timeout seems to be valid after that.
> ...
>> Currently, cur_timeout is set according to only checkpoint_timeout when it is during recovery.
>> Even during recovery, the cur_timeout should be calculated including archive_timeout as well as checkpoint_timeout,
Ithink.
>> I attached the patch to solve this problem.
>
> Unfortunately the diff command in your test script doesn't show me
> anything, but I can understand what you are thinking is a problem,
> maybe. But the patch doesn't seem the fix for the issue.
>
> Archiving works irrelevantly from that parameter. Completed WAL
> segments are immediately marked as ".ready" and archiver does its task
> immediately independently from checkpointer. The parameter, as
> described in documentation, forces the server to switch to a new WAL
> segment file periodically so that it can be archived, that is, it
> works only on primary. On the other hand on standby, a WAL segment is
> not marked as ".ready" until any data for the *next* segment comes. So
> the patch is not the fix for the issue.
The problems that you're describing and Daisuke-san reported are really
the same? The reported problem seems that checkpointer can sleep on
the latch for more than archive_timeout just after recovery and cannot
switch WAL files promptly even if necessary.
The cause of this problem is that the checkpointer's sleep time is calculated
from both checkpoint_timeout and archive_timeout during normal running,
but calculated only from checkpoint_timeout during recovery. So Daisuke-san's
patch tries to change that so that it's calculated from both of them even
during recovery. No?
- if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
+ if (XLogArchiveTimeout > 0)
{
elapsed_secs = now - last_xlog_switch_time;
- if (elapsed_secs >= XLogArchiveTimeout)
+ if (elapsed_secs >= XLogArchiveTimeout && !RecoveryInProgress())
continue; /* no sleep for us ... */
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
last_xlog_switch_time is not updated during recovery. So "elapsed_secs" can be
large and cur_timeout can be negative. Isn't this problematic?
As another approach, what about waking the checkpointer up at the end of
recovery like we already do for walsenders?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
pgsql-hackers by date: