Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hard limit on WAL space used (because PANIC sucks) |
Date | |
Msg-id | CA+U5nMLONnUt+UEUdz6t7cnUoexjJTh12BjCkz7iGdO+jyP=ag@mail.gmail.com Whole thread Raw |
In response to | Hard limit on WAL space used (because PANIC sucks) (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Hard limit on WAL space used (because PANIC sucks)
|
List | pgsql-hackers |
On 6 June 2013 16:00, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > In the "Redesigning checkpoint_segments" thread, many people opined that > there should be a hard limit on the amount of disk space used for WAL: > http://www.postgresql.org/message-id/CA+TgmoaOkgZb5YsmQeMg8ZVqWMtR=6S4-PPd+6jiy4OQ78ihUA@mail.gmail.com. > I'm starting a new thread on that, because that's mostly orthogonal to > redesigning checkpoint_segments. > > The current situation is that if you run out of disk space while writing > WAL, you get a PANIC, and the server shuts down. That's awful. We can try to > avoid that by checkpointing early enough, so that we can remove old WAL > segments to make room for new ones before you run out, but unless we somehow > throttle or stop new WAL insertions, it's always going to be possible to use > up all disk space. A typical scenario where that happens is when > archive_command fails for some reason; even a checkpoint can't remove old, > unarchived segments in that case. But it can happen even without WAL > archiving. I don't see we need to prevent WAL insertions when the disk fills. We still have the whole of wal_buffers to use up. When that is full, we will prevent further WAL insertions because we will be holding the WALwritelock to clear more space. So the rest of the system will lock up nicely, like we want, apart from read-only transactions. Instead of PANICing, we should simply signal the checkpointer to perform a shutdown checkpoint. That normally requires a WAL insertion to complete, but it seems easy enough to make that happen by simply rewriting the control file, after which ALL WAL files are superfluous for crash recovery and can be deleted. Once that checkpoint is complete, we can begin deleting WAL files that are archived/replicated and continue as normal. The previously failing WAL write can now be made again and may succeed this time - if it does, we continue, if not - now we PANIC. Note that this would not require in-progress transactions to be aborted. They can continue normally once wal_buffers re-opens. We don't really want anything too drastic, because if this situation happens once it may happen many times - I'm imagining a flaky network etc.. So we want the situation to recover quickly and easily, without too many consequences. The above appears to be very minimal change from existing code and doesn't introduce lots of new points of breakage. > I've seen a case, where it was even worse than a PANIC and shutdown. pg_xlog > was on a separate partition that had nothing else on it. The partition > filled up, and the system shut down with a PANIC. Because there was no space > left, it could not even write the checkpoint after recovery, and thus > refused to start up again. There was nothing else on the partition that you > could delete to make space. The only recourse would've been to add more disk > space to the partition (impossible), or manually delete an old WAL file that > was not needed to recover from the latest checkpoint (scary). Fortunately > this was a test system, so we just deleted everything. Doing shutdown checkpoints via the control file would exactly solve that issue. We already depend upon the readability of the control file anyway, so this changes nothing. (And if you regard it does, then we can have multiple control files, or at least a backup control file at shutdown). We can make the shutdown checkpoint happen always at EOF of a WAL segment, so at shutdown we don't need any WAL files to remain at all. > So we need to somehow stop new WAL insertions from happening, before it's > too late. I don't think we do. What might be sensible is to have checkpoints speed up as WAL volume approaches a predefined limit, so that we minimise the delay caused when wal_buffers locks up. Not suggesting anything here for 9.4, since we're midCF. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: