Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Streaming replication and a disk full in primary |
Date | |
Msg-id | 4BC2F8F6.8070506@enterprisedb.com Whole thread Raw |
In response to | Re: Streaming replication and a disk full in primary (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Streaming replication and a disk full in primary
Re: Streaming replication and a disk full in primary |
List | pgsql-hackers |
Fujii Masao wrote: > doc/src/sgml/config.sgml > - archival or to recover from a checkpoint. If standby_keep_segments > + archival or to recover from a checkpoint. If > <varname>standby_keep_segments</> > > The word "standby_keep_segments" always needs the <varname> tag, I think. Thanks, fixed. > We should remove the document "25.2.5.2. Monitoring"? I updated it to no longer claim that the primary can run out of disk space because of a hung WAL sender. The information about calculating the lag between primary and standby still seems valuable, so I didn't remove the whole section. > Why is standby_keep_segments used even if max_wal_senders is zero? > In that case, ISTM we don't need to keep any WAL files in pg_xlog > for the standby. True. I don't think we should second guess the admin on that, though. Perhaps he only set max_wal_senders=0 temporarily, and will be disappointed if the the logs are no longer there when he sets it back to non-zero and restarts the server. > When XLogRead() reads two WAL files and only the older of them is recycled > during being read, it might fail in checking whether the read data is valid. > This is because the variable "recptr" can advance to the newer WAL file > before the check. Thanks, fixed. > When walreceiver has gotten stuck for some reason, walsender would be > unable to pass through the send() system call, and also get stuck. > In the patch, such a walsender cannot exit forever because it cannot > call XLogRead(). So I think that the bgwriter needs to send the > exit-signal to such a too lagged walsender. Thought? Any backend can get stuck like that. > The shmem of latest recycled WAL file is updated before checking whether > it's already been archived. If archiving is not working for some reason, > the WAL file which that shmem indicates might not actually have been > recycled yet. In this case, the standby cannot obtain the WAL file from > the primary because it's been marked as "latest recycled", and from the > archive because it's not been archived yet. This seems to be a big problem. > How about moving the update of the shmem to after calling XLogArchiveCheckDone() > in RemoveOldXlogFiles()? Good point. It's particularly important considering that if a segment hasn't been archived yet, it's not available to the standby from the archive either. I changed that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: