Re: [9.3 bug] disk space in pg_xlog increases during archive recovery - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: [9.3 bug] disk space in pg_xlog increases during archive recovery |
Date | |
Msg-id | CAHGQGwFiFOOrh-vdu6t3AL9YHCHz0eWeEAoU-tgQQ=AbgtWASw@mail.gmail.com Whole thread Raw |
In response to | Re: [9.3 bug] disk space in pg_xlog increases during archive recovery (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: [9.3 bug] disk space in pg_xlog increases during
archive recovery
|
List | pgsql-hackers |
On Sun, Feb 2, 2014 at 5:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-24 22:31:17 +0900, MauMau wrote: >> From: "Fujii Masao" <masao.fujii@gmail.com> >> >On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas >> >>>Thanks! The patch looks good to me. Attached is the updated version of >> >>>the patch. I added the comments. >> >> Thank you very much. Your comment looks great. I tested some recovery >> situations, and confirmed that WAL segments were kept/removed as intended. >> I'll update the CommitFest entry with this patch. > > You haven't updated the patches status so far, so I've now marked as > returned feedback as several people voiced serious doubts about the > approach. If that's not accurate please speak up. > >> >The problem is, we might not be able to perform restartpoints more >> >aggressively >> >even if we reduce checkpoint_timeout in the server under the archive >> >recovery. >> >Because the frequency of occurrence of restartpoints depends on not only >> >that >> >checkpoint_timeout but also the checkpoints which happened while the >> >server >> >was running. >> >> I haven't tried reducing checkpoint_timeout. > > Did you try reducing checkpoint_segments? As I pointed out, at least if > standby_mode is enabled, it will also trigger checkpoints, independently > from checkpoint_timeout. Right. If standby_mode is enabled, checkpoint_segment can trigger the restartpoint. But the problem is that the timing of restartpoint depends on not only the checkpoint parameters (i.e., checkpoint_timeout and checkpoint_segments) that are used during archive recovery but also the checkpoint WAL that was generated by the master. For example, could you imagine the case where the master generated only one checkpoint WAL since the last backup and it crashed with database corruption. Then DBA decided to perform normal archive recovery by using the last backup. In this case, even if DBA reduces both checkpoint_timeout and checkpoint_segments, only one restartpoint can occur during recovery. This low frequency of restartpoint might fill up the disk space with lots of WAL files. This would be harmless if the server that we are performing recovery in has enough disk space. But I can imagine that some users want to recover the database and restart the service temporarily in poor server with less enough disk space until they can purchase sufficient server. In this case, accumulating lots of WAL files in pg_xlog might be harmful. > If the issue is that you're not using standby_mode (if so, why?), then > the fix maybe is to make that apply to a wider range of situations. I guess that he is not using standby_mode because, according to his first email in this thread, he said he would like to prevent WAL from accumulating in pg_xlog during normal archive recovery (i.e., PITR). Regards, -- Fujii Masao
pgsql-hackers by date: