Thread: Streaming replication

Streaming replication

From

Mike Howland

Date:

03 December 2017, 05:29:50

Hello all, I have a question that perhaps somebody can help me with. I'm using streaming replication with 9.2 (I know it's quite old now, we plan to move to 9.6 shortly but I'd like to get these older machines working if I can). I have two machines where the secondary is configured as a backup of the primary machine. I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication to establish streaming replication... BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine. Most of the time it comes up, connects and is quite happy. For my own sanity I have a bit of monitoring code that occasionally injects a change on the primary and watches for it to arrive on the secondary to ensure data is flowing. I also have some code that checks to see how close things are using pg_current_xlog_location() and pg_last_xlog_receive_location(). When things are working, these checks are always fine. Once in a while, this nonce value does not seem to be updated on the secondary server (although I can see it change on the primary). During this period of time, my watcher code also detects that pg_last_xlog_receive_location() is ahead of pg_current_xlog_location(). From what I can tell, the secondary has connected to the primary, it's just data does not seem to be flowing. I'll attach a bit of detal below. I'm sure I must be doing something wrong when I prime the secondary database. It works most of the time but that's simply not good enough. I'd buy a virtual coffee for anybody that can show me the error of my ways here... Any ideas? A bit of output/background that may be useful: Output from watch dog code output (running on the secondary). It queries the local (secondary) cluster and also connects to the primary cluster to insert a nonce and gather some information. The nonce is just a counting int. When the secondary is started, it starts counting at 0 and increases every few minutes. As you can see, the data in the secondary is still at 18 (from a previous switch). This is very odd because the primary cluster was rsync'ed shortly before this. last xlog receive 0/5000000 current xlog location: 0/41A3008 WARNING: Secondary appears ahead of primary: WARNING!! nonce match comparison failed: 2 != 18 dog: nonce did not compare, but it's not been very long yet. The sequence I'm using to prime the secondary: 0) Request that the secondary be shutdown and internally backed up. 1) Without backup being set (just to move the bulk of the files) rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude pg_log --exclude recovery.conf --exclude backup_label --exclude pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude postgresql.conf /db/pg/general 77.66.27.224:/db/pg 2) Enter backup mode SELECT pg_start_backup('sync-backup', true); 3) Rsync again, hopefully quicker than #1. I wanted to minimize the time it spent in backup. rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude pg_log --exclude recovery.conf --exclude backup_label --exclude pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude postgresql.conf /db/pg/general 77.66.27.224:/db/pg 4) Exit backup mode SELECT pg_stop_backup(); 5) Rsync pg_xlog files rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete /db/pg/general/pg_xlog 77.66.27.224:/db/pg/general 6) Rsync archive files to secondary rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude syncInProgress --exclude archiveScript /db/pg/general.archive 77.66.27.224:/db/pg 7) Tell the secondary server to start up Output from postgres on the secondary server when it starts. I've tried looking into the cannot stat issues without much success. Note that in situations where it does work, it looks just like this as well (complete with the zero length warning, etc). LOG: database system was shut down in recovery at 2017-11-29 06:43:13 GMT cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}: No such file or directory LOG: entering standby mode cp: cannot stat '{$0}{$8}/pg/general.archive/000000010000000000000005'{$0}{$9}: No such file or directory LOG: consistent recovery state reached at 0/5000080 LOG: record with zero length at 0/5000080 LOG: database system is ready to accept read only connections cp: cannot stat '{$0}{$8}/db/pg/general.archive/000000010000000000000005'{$0}{$9}: No such file or directory cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}: No such file or directory LOG: streaming replication successfully connected to primary Any thoughts at all are appreciated! Thanks, -mike

Re: Streaming replication

From

Laurenz Albe

Date:

04 December 2017, 10:16:02

Mike Howland wrote:
> I have two machines where the secondary is configured as a backup of the primary machine.
> I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication
> to establish streaming replication...
> BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine.
> 
> Most of the time it comes up, connects and is quite happy.
> For my own sanity I have a bit of monitoring code that occasionally injects a change
> on the primary and watches for it to arrive on the secondary to ensure data is flowing.
> I also have some code that checks to see how close things are using pg_current_xlog_location()
> and pg_last_xlog_receive_location().  When things are working, these checks are always fine.
> 
> Once in a while, this nonce value does not seem to be updated on the secondary server
> (although I can see it change on the primary).
> During this period of time, my watcher code also detects that pg_last_xlog_receive_location()
> is ahead of pg_current_xlog_location().  From what I can tell, the secondary has connected to the primary,
> it's just data does not seem to be flowing.  I'll attach a bit of detal below.

The way you initialize the standby looks fine.

One explanation for the delay you observe would be if "max_standby_streaming_delay"
is set greater than zero and there are queries running on the standby that block
application of the changes streamed from the primary.

Yours,
Laurenz Albe

Re: Streaming replication

From

Mike Howland

Date:

04 December 2017, 23:30:33

Thanks for the feedback (appreciated!!). I'll spend some time looking at the streaming delay. Generally all connections to the secondary are restricted (with the exception of my nonce query, which should be very short lived). -mike On Mon, Dec 4, 2017 at 2:16 AM, Laurenz Albe wrote: > Mike Howland wrote: > > I have two machines where the secondary is configured as a backup of the > primary machine. > > I used the instructions here: https://wiki.postgresql.org/ > wiki/Streaming_Replication > > to establish streaming replication... > > BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the > files on the secondary machine. > > > > Most of the time it comes up, connects and is quite happy. > > For my own sanity I have a bit of monitoring code that occasionally > injects a change > > on the primary and watches for it to arrive on the secondary to ensure > data is flowing. > > I also have some code that checks to see how close things are using > pg_current_xlog_location() > > and pg_last_xlog_receive_location(). When things are working, these > checks are always fine. > > > > Once in a while, this nonce value does not seem to be updated on the > secondary server > > (although I can see it change on the primary). > > During this period of time, my watcher code also detects that > pg_last_xlog_receive_location() > > is ahead of pg_current_xlog_location(). From what I can tell, the > secondary has connected to the primary, > > it's just data does not seem to be flowing. I'll attach a bit of detal > below. > > The way you initialize the standby looks fine. > > One explanation for the delay you observe would be if > "max_standby_streaming_delay" > is set greater than zero and there are queries running on the standby that > block > application of the changes streamed from the primary. > > Yours, > Laurenz Albe >

Re: Streaming replication

From

Hugo Lourenco

Date:

08 January 2018, 14:00:57


Enviado a partir do meu smartphone BlackBerry 10.
  Mensagem original  
De: Laurenz Albe
Enviado: Segunda-feira, 4 de Dezembro de 2017 09:52
Para: Mike Howland; pgsql-novice@lists.postgresql.org
Assunto: Re: Streaming replication

Mike Howland wrote:
> I have two machines where the secondary is configured as a backup of the primary machine.
> I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication
> to establish streaming replication...
> BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine.
>
> Most of the time it comes up, connects and is quite happy.
> For my own sanity I have a bit of monitoring code that occasionally injects a change
> on the primary and watches for it to arrive on the secondary to ensure data is flowing.
> I also have some code that checks to see how close things are using pg_current_xlog_location()
> and pg_last_xlog_receive_location(). When things are working, these checks are always fine.
>
> Once in a while, this nonce value does not seem to be updated on the secondary server
> (although I can see it change on the primary).
> During this period of time, my watcher code also detects that pg_last_xlog_receive_location()
> is ahead of pg_current_xlog_location(). From what I can tell, the secondary has connected to the primary,
> it's just data does not seem to be flowing. I'll attach a bit of detal below.

The way you initialize the standby looks fine.

One explanation for the delay you observe would be if "max_standby_streaming_delay"
is set greater than zero and there are queries running on the standby that block
application of the changes streamed from the primary.

Yours,
Laurenz Albe