Home > mailing lists

Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Streaming replication and a disk full in primary
Date	April 12, 2010 07:42:15
Msg-id	4BC2F8F6.8070506@enterprisedb.com Whole thread Raw
In response to	Re: Streaming replication and a disk full in primary (Fujii Masao <masao.fujii@gmail.com>)
Responses	Re: Streaming replication and a disk full in primary Re: Streaming replication and a disk full in primary
List	pgsql-hackers

Tree view

Fujii Masao wrote:
> doc/src/sgml/config.sgml
> -        archival or to recover from a checkpoint. If standby_keep_segments
> +        archival or to recover from a checkpoint. If
> <varname>standby_keep_segments</>
> 
> The word "standby_keep_segments" always needs the <varname> tag, I think.

Thanks, fixed.

> We should remove the document "25.2.5.2. Monitoring"?

I updated it to no longer claim that the primary can run out of disk
space because of a hung WAL sender. The information about calculating
the lag between primary and standby still seems valuable, so I didn't
remove the whole section.

> Why is standby_keep_segments used even if max_wal_senders is zero?
> In that case, ISTM we don't need to keep any WAL files in pg_xlog
> for the standby.

True. I don't think we should second guess the admin on that, though.
Perhaps he only set max_wal_senders=0 temporarily, and will be
disappointed if the the logs are no longer there when he sets it back to
non-zero and restarts the server.

> When XLogRead() reads two WAL files and only the older of them is recycled
> during being read, it might fail in checking whether the read data is valid.
> This is because the variable "recptr" can advance to the newer WAL file
> before the check.

Thanks, fixed.

> When walreceiver has gotten stuck for some reason, walsender would be
> unable to pass through the send() system call, and also get stuck.
> In the patch, such a walsender cannot exit forever because it cannot
> call XLogRead(). So I think that the bgwriter needs to send the
> exit-signal to such a too lagged walsender. Thought?

Any backend can get stuck like that.

> The shmem of latest recycled WAL file is updated before checking whether
> it's already been archived. If archiving is not working for some reason,
> the WAL file which that shmem indicates might not actually have been
> recycled yet. In this case, the standby cannot obtain the WAL file from
> the primary because it's been marked as "latest recycled", and from the
> archive because it's not been archived yet. This seems to be a big problem.
> How about moving the update of the shmem to after calling XLogArchiveCheckDone()
> in RemoveOldXlogFiles()?

Good point. It's particularly important considering that if a segment
hasn't been archived yet, it's not available to the standby from the
archive either. I changed that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Fujii Masao
Date: 12 April 2010, 06:06:34
Subject: Re: testing HS/SR - 1 vs 2 performance

From: Heikki Linnakangas
Date: 12 April 2010, 07:49:50
Subject: Re: testing hot standby

Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

Previous

Next