Home > mailing lists

Thread: Streaming replication and an archivelog

Streaming replication and an archivelog

From

James Sewell

Date:

08 May 2015, 06:28:37

Hello All,

I am running 9.4 on Centos.

I have three servers, one master and two slaves. The slaves have the following recovery.conf

standby_mode = 'on'

primary_conninfo = 'user=postgres host=mastervip port=5432'

restore_command = 'scp -o BatchMode=yes postgres@backuphost:/archived_wals/%f %p'

recovery_target_timeline= 'latest'

Is there any way to combine following

a master switch (ie: if node1 dies and node2 is promoted then node3 follows node2)
using a WAL archive, such that if node2 goes down for two days it will get WALs from the archive if they are no longer on the master

At the moment the master switch works fine, but if I was to have a WAL archive with multiple timelines in it then I would end up in the newest timeline.

I suppose what I want is the following:

If I am a streaming replica only follow streamed timeline switches, not archive timeline switches.

Obviously if I am not a streaming replica I need to follow archive timeline switches so I don't break PIT recovery.

Possible?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Streaming replication and an archivelog

From

James Sewell

Date:

08 May 2015, 06:47:58

Hey,

This is an example of what I see when starting up a replica and attaching it to a master.

In this case the master is in timeline 3, but the archive log goes up to timeline 6.

2015-05-08 16:23:07 AEST @ ( 0 00000)LOG: database system was interrupted; last known up at 2015-05-08 16:21:14 AEST

2015-05-08 16:23:07 AEST @ ( 0 00000)LOG: restored log file "00000004.history" from archive

2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: restored log file "00000005.history" from archive

2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: restored log file "00000006.history" from archive

scp: /backup/production/archived_wals/00000007.history: No such file or directory

2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: entering standby mode

2015-05-08 16:23:09 AEST @ ( 0 00000)LOG: restored log file "00000006.history" from archive

scp: /backup/production/archived_wals/000000060000000B000000C2: No such file or directory

scp: /backup/production/archived_wals/000000050000000B000000C2: No such file or directory

scp: /backup/production/archived_wals/000000040000000B000000C2: No such file or directory

2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: restored log file "000000030000000B000000C2" from archive

2015-05-08 16:23:11 AEST @ ( 0 XX000)FATAL: requested timeline 6 is not a child of this server's history

2015-05-08 16:23:11 AEST @ ( 0 XX000)DETAIL: Latest checkpoint is at B/C2000060 on timeline 3, but in the history of the requested timeline, the server forked off from that timeline at B/BE000060.

2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: startup process (PID 21893) exited with exit code 1

2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: aborting startup due to startup process failure

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Fri, May 8, 2015 at 4:28 PM, James Sewell <james.sewell@lisasoft.com> wrote:

Hello All,

I am running 9.4 on Centos.

I have three servers, one master and two slaves. The slaves have the following recovery.conf

standby_mode = 'on'
primary_conninfo = 'user=postgres host=mastervip port=5432'
restore_command = 'scp -o BatchMode=yes postgres@backuphost:/archived_wals/%f %p'
recovery_target_timeline= 'latest'

Is there any way to combine following
a master switch (ie: if node1 dies and node2 is promoted then node3 follows node2)
using a WAL archive, such that if node2 goes down for two days it will get WALs from the archive if they are no longer on the master
At the moment the master switch works fine, but if I was to have a WAL archive with multiple timelines in it then I would end up in the newest timeline.

I suppose what I want is the following:

If I am a streaming replica only follow streamed timeline switches, not archive timeline switches.

Obviously if I am not a streaming replica I need to follow archive timeline switches so I don't break PIT recovery.

Possible?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

Re: Streaming replication and an archivelog

From

James Sewell

Date:

08 May 2015, 07:17:33

OK I think I have a solution, using the following script as the restore_command

echo $1 | grep history > /dev/null
if [ $? -eq 0 ]; then
exit 1
fi

scp postgres@postgres3:/archived_wals/$1 $2

This seems to work, as it restricts slave servers from switching timeline from the archive by stopping them ever knowing about history files.

Can anyone see any problems with this approach?

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Fri, May 8, 2015 at 4:47 PM, James Sewell <james.sewell@lisasoft.com> wrote:

Hey,

This is an example of what I see when starting up a replica and attaching it to a master.

In this case the master is in timeline 3, but the archive log goes up to timeline 6.

2015-05-08 16:23:07 AEST @ ( 0 00000)LOG: database system was interrupted; last known up at 2015-05-08 16:21:14 AEST
2015-05-08 16:23:07 AEST @ ( 0 00000)LOG: restored log file "00000004.history" from archive
2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: restored log file "00000005.history" from archive
2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: restored log file "00000006.history" from archive
scp: /backup/production/archived_wals/00000007.history: No such file or directory
2015-05-08 16:23:08 AEST @ ( 0 00000)LOG: entering standby mode
2015-05-08 16:23:09 AEST @ ( 0 00000)LOG: restored log file "00000006.history" from archive
scp: /backup/production/archived_wals/000000060000000B000000C2: No such file or directory
scp: /backup/production/archived_wals/000000050000000B000000C2: No such file or directory
scp: /backup/production/archived_wals/000000040000000B000000C2: No such file or directory
2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: restored log file "000000030000000B000000C2" from archive
2015-05-08 16:23:11 AEST @ ( 0 XX000)FATAL: requested timeline 6 is not a child of this server's history
2015-05-08 16:23:11 AEST @ ( 0 XX000)DETAIL: Latest checkpoint is at B/C2000060 on timeline 3, but in the history of the requested timeline, the server forked off from that timeline at B/BE000060.
2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: startup process (PID 21893) exited with exit code 1
2015-05-08 16:23:11 AEST @ ( 0 00000)LOG: aborting startup due to startup process failure

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Fri, May 8, 2015 at 4:28 PM, James Sewell <james.sewell@lisasoft.com> wrote:
Hello All,

I am running 9.4 on Centos.

I have three servers, one master and two slaves. The slaves have the following recovery.conf

standby_mode = 'on'
primary_conninfo = 'user=postgres host=mastervip port=5432'
restore_command = 'scp -o BatchMode=yes postgres@backuphost:/archived_wals/%f %p'
recovery_target_timeline= 'latest'

Is there any way to combine following
a master switch (ie: if node1 dies and node2 is promoted then node3 follows node2)
using a WAL archive, such that if node2 goes down for two days it will get WALs from the archive if they are no longer on the master
At the moment the master switch works fine, but if I was to have a WAL archive with multiple timelines in it then I would end up in the newest timeline.

I suppose what I want is the following:

If I am a streaming replica only follow streamed timeline switches, not archive timeline switches.

Obviously if I am not a streaming replica I need to follow archive timeline switches so I don't break PIT recovery.

Possible?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

Re: Streaming replication and an archivelog

From

Sameer Kumar

Date:

09 May 2015, 08:55:45

On Fri, May 8, 2015 at 2:29 PM James Sewell <james.sewell@lisasoft.com> wrote:

Hello All,

I am running 9.4 on Centos.

I have three servers, one master and two slaves. The slaves have the following recovery.conf

standby_mode = 'on'
primary_conninfo = 'user=postgres host=mastervip port=5432'
restore_command = 'scp -o BatchMode=yes postgres@backuphost:/archived_wals/%f %p'
recovery_target_timeline= 'latest'

Is there any way to combine following
a master switch (ie: if node1 dies and node2 is promoted then node3 follows node2)
using a WAL archive, such that if node2 goes down for two days it will get WALs from the archive if they are no longer on the master
At the moment the master switch works fine, but if I was to have a WAL archive with multiple timelines in it then I would end up in the newest timeline.

The timeline switch would happen whenever do a standby promotion (or PITR) in which case you want the 2nd standby to follow the newly promoted master i.e. it should be following the new timeline.

I suppose what I want is the following:

If I am a streaming replica only follow streamed timeline switches, not archive timeline switches.

I am trying to guess when would these two be different (unless there are two different masters writing to the same archive which I suppose is wrong anyways).

Can you write down an example for clarity so that others can understand your scenario better?

Obviously if I am not a streaming replica I need to follow archive timeline switches so I don't break PIT recovery.

Possible?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.