Thread: Streaming replication
Hello all,
I have a question that perhaps somebody can help me with. I'm using
streaming replication with 9.2 (I know it's quite old now, we plan to move
to 9.6 shortly but I'd like to get these older machines working if I can).
I have two machines where the secondary is configured as a backup of the
primary machine. I used the instructions here:
https://wiki.postgresql.org/wiki/Streaming_Replication to establish
streaming replication... BTW, I'm using the 7.1 path (using
backup/rsync/!backup) to prime the files on the secondary machine.
Most of the time it comes up, connects and is quite happy. For my own
sanity I have a bit of monitoring code that occasionally injects a change
on the primary and watches for it to arrive on the secondary to ensure data
is flowing. I also have some code that checks to see how close things are
using pg_current_xlog_location() and pg_last_xlog_receive_location(). When
things are working, these checks are always fine.
Once in a while, this nonce value does not seem to be updated on the
secondary server (although I can see it change on the primary). During this
period of time, my watcher code also detects
that pg_last_xlog_receive_location() is ahead of
pg_current_xlog_location(). From what I can tell, the secondary has
connected to the primary, it's just data does not seem to be flowing. I'll
attach a bit of detal below.
I'm sure I must be doing something wrong when I prime the secondary
database. It works most of the time but that's simply not good enough.
I'd buy a virtual coffee for anybody that can show me the error of my ways
here...
Any ideas?
A bit of output/background that may be useful:
Output from watch dog code output (running on the secondary). It queries
the local (secondary) cluster and also connects to the primary cluster to
insert a nonce and gather some information.
The nonce is just a counting int. When the secondary is started, it starts
counting at 0 and increases every few minutes. As you can see, the data in
the secondary is still at 18 (from a previous switch).
This is very odd because the primary cluster was rsync'ed shortly before
this.
last xlog receive 0/5000000
current xlog location: 0/41A3008
WARNING: Secondary appears ahead of primary:
WARNING!! nonce match comparison failed: 2 != 18
dog: nonce did not compare, but it's not been very long yet.
The sequence I'm using to prime the secondary:
0) Request that the secondary be shutdown and internally backed up.
1) Without backup being set (just to move the bulk of the files)
rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete
--exclude pg_log --exclude recovery.conf --exclude backup_label --exclude
pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude
postgresql.conf /db/pg/general 77.66.27.224:/db/pg
2) Enter backup mode
SELECT pg_start_backup('sync-backup', true);
3) Rsync again, hopefully quicker than #1. I wanted to minimize the time
it spent in backup.
rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete
--exclude pg_log --exclude recovery.conf --exclude backup_label --exclude
pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude
postgresql.conf /db/pg/general 77.66.27.224:/db/pg
4) Exit backup mode
SELECT pg_stop_backup();
5) Rsync pg_xlog files
rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete
/db/pg/general/pg_xlog 77.66.27.224:/db/pg/general
6) Rsync archive files to secondary
rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete
--exclude syncInProgress --exclude archiveScript /db/pg/general.archive
77.66.27.224:/db/pg
7) Tell the secondary server to start up
Output from postgres on the secondary server when it starts. I've tried
looking into the cannot stat issues without much success. Note that in
situations where it does work, it looks just like this as well (complete
with the zero length warning, etc).
LOG: database system was shut down in recovery at 2017-11-29 06:43:13 GMT
cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}:
No such file or directory
LOG: entering standby mode
cp: cannot stat
'{$0}{$8}/pg/general.archive/000000010000000000000005'{$0}{$9}: No such
file or directory
LOG: consistent recovery state reached at 0/5000080
LOG: record with zero length at 0/5000080
LOG: database system is ready to accept read only connections
cp: cannot stat
'{$0}{$8}/db/pg/general.archive/000000010000000000000005'{$0}{$9}: No such
file or directory
cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}:
No such file or directory
LOG: streaming replication successfully connected to primary
Any thoughts at all are appreciated!
Thanks,
-mike
Mike Howland wrote: > I have two machines where the secondary is configured as a backup of the primary machine. > I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication > to establish streaming replication... > BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine. > > Most of the time it comes up, connects and is quite happy. > For my own sanity I have a bit of monitoring code that occasionally injects a change > on the primary and watches for it to arrive on the secondary to ensure data is flowing. > I also have some code that checks to see how close things are using pg_current_xlog_location() > and pg_last_xlog_receive_location(). When things are working, these checks are always fine. > > Once in a while, this nonce value does not seem to be updated on the secondary server > (although I can see it change on the primary). > During this period of time, my watcher code also detects that pg_last_xlog_receive_location() > is ahead of pg_current_xlog_location(). From what I can tell, the secondary has connected to the primary, > it's just data does not seem to be flowing. I'll attach a bit of detal below. The way you initialize the standby looks fine. One explanation for the delay you observe would be if "max_standby_streaming_delay" is set greater than zero and there are queries running on the standby that block application of the changes streamed from the primary. Yours, Laurenz Albe
Thanks for the feedback (appreciated!!).
I'll spend some time looking at the streaming delay. Generally all
connections to the secondary are restricted (with the exception of my nonce
query, which should be very short lived).
-mike
On Mon, Dec 4, 2017 at 2:16 AM, Laurenz Albe
wrote:
> Mike Howland wrote:
> > I have two machines where the secondary is configured as a backup of the
> primary machine.
> > I used the instructions here: https://wiki.postgresql.org/
> wiki/Streaming_Replication
> > to establish streaming replication...
> > BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the
> files on the secondary machine.
> >
> > Most of the time it comes up, connects and is quite happy.
> > For my own sanity I have a bit of monitoring code that occasionally
> injects a change
> > on the primary and watches for it to arrive on the secondary to ensure
> data is flowing.
> > I also have some code that checks to see how close things are using
> pg_current_xlog_location()
> > and pg_last_xlog_receive_location(). When things are working, these
> checks are always fine.
> >
> > Once in a while, this nonce value does not seem to be updated on the
> secondary server
> > (although I can see it change on the primary).
> > During this period of time, my watcher code also detects that
> pg_last_xlog_receive_location()
> > is ahead of pg_current_xlog_location(). From what I can tell, the
> secondary has connected to the primary,
> > it's just data does not seem to be flowing. I'll attach a bit of detal
> below.
>
> The way you initialize the standby looks fine.
>
> One explanation for the delay you observe would be if
> "max_standby_streaming_delay"
> is set greater than zero and there are queries running on the standby that
> block
> application of the changes streamed from the primary.
>
> Yours,
> Laurenz Albe
>
Enviado a partir do meu smartphone BlackBerry 10. Mensagem original De: Laurenz Albe Enviado: Segunda-feira, 4 de Dezembro de 2017 09:52 Para: Mike Howland; pgsql-novice@lists.postgresql.org Assunto: Re: Streaming replication Mike Howland wrote: > I have two machines where the secondary is configured as a backup of the primary machine. > I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication > to establish streaming replication... > BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine. > > Most of the time it comes up, connects and is quite happy. > For my own sanity I have a bit of monitoring code that occasionally injects a change > on the primary and watches for it to arrive on the secondary to ensure data is flowing. > I also have some code that checks to see how close things are using pg_current_xlog_location() > and pg_last_xlog_receive_location(). When things are working, these checks are always fine. > > Once in a while, this nonce value does not seem to be updated on the secondary server > (although I can see it change on the primary). > During this period of time, my watcher code also detects that pg_last_xlog_receive_location() > is ahead of pg_current_xlog_location(). From what I can tell, the secondary has connected to the primary, > it's just data does not seem to be flowing. I'll attach a bit of detal below. The way you initialize the standby looks fine. One explanation for the delay you observe would be if "max_standby_streaming_delay" is set greater than zero and there are queries running on the standby that block application of the changes streamed from the primary. Yours, Laurenz Albe