Thread: Replication Lag
We are having replication lag issues in our production environment. We are on postgres 9.015.
We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce
this by doing transactions on the master and we see that transactions are not coming over to slave.
Is there any way we are hitting a bug?
KARAN CHATHA | Manager, Data Services | CMG Technology
karan.chatha@coxinc.com | p: 678-645-4083| m: 404-713-1368
Er. 9.0.15?We are having replication lag issues in our production environment. We are on postgres 9.015.
Are you seeing a *lag* in replication or no replication at all?
We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce
this by doing transactions on the master and we see that transactions are not coming over to slave.
Is there any way we are hitting a bug?
Possibly but I'm going to guess that the most likely location of the bug is somewhere in your configuration. You need to provide more information. This page is a good guide: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
In particular, I'd like to know for starters:
0. What form of replication are you using? Bucardo? Slony? Pgpool? Londiste? Mammoth? Hot-standby? Warm-standby? ...
1. Did it ever work?
2. If so, what changed? (configuration, upgrades, network, ???)
3. Are all machines on the same version?
4. Have you done any upgrades? If so, did you follow all the special notes regarding each upgrade? Occasionally minor upgrades require steps beyond simply replacing the binary and at times those have involved replication issues.
5. Anything of interest in the logs on the master or any of the standbys? Be sure sufficient logging is enabled.
Cheers,
Steve
And then what changed? *Anything* that might have happened. Config change, unclean reboot, out of disk, firewall updates, network changes, anything at all...1) It was working until March 8
There are a few items that require special handling between 9.03 and 9.0.15. Did you read all the release notes and make sure that the extra steps were completed or didn't apply to you? (I'm not sure that any directly impact replication but haven't been running anything earlier than 9.1 for quite a while.)2) We upgraded Postgres from 9.03 to 9.015 on Feb 19
What do the logs tell you? Have you thoroughly examined them both for current messages and anything unusual around the time that the issue appeared?3) Streaming Replication
4) Right now we have master on 9.015 and 7 slaves on 9.015 and one slave on 9.0.16
5) We have full logging enable to syslog
Cheers,6) What we see is that there are no loads or io but archives get stuck on one archive. We have
7) max_standby_archive_delay = 60000 # max delay before canceling queries
max_standby_streaming_delay = 60000
It is almost like these values are not being honored.
Steve
1) It was working until March 8
2) We upgraded Postgres from 9.03 to 9.015 on Feb 19
3) Streaming Replication
4) Right now we have master on 9.015 and 7 slaves on 9.015 and one slave on 9.0.16
5) We have full logging enable to syslog
6) What we see is that there are no loads or io but archives get stuck on one archive. We have
7) max_standby_archive_delay = 60000 # max delay before canceling queries
max_standby_streaming_delay = 60000
It is almost like these values are not being honored.
Thx
KARAN CHATHA | Manager, Data Services | CMG Technology
karan.chatha@coxinc.com | p: 678-645-4083| m: 404-713-1368
From: Steve Crawford [mailto:scrawford@pinpointresearch.com]
Sent: Friday, March 21, 2014 3:13 PM
To: Chatha, Karan (CMG-Atlanta); pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Replication Lag
On 03/20/2014 07:42 PM, Chatha, Karan (CMG-Atlanta) wrote:
We are having replication lag issues in our production environment. We are on postgres 9.015.
Er. 9.0.15?
We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce
this by doing transactions on the master and we see that transactions are not coming over to slave.
Are you seeing a *lag* in replication or no replication at all?
Is there any way we are hitting a bug?
Possibly but I'm going to guess that the most likely location of the bug is somewhere in your configuration. You need to provide more information. This page is a good guide: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
In particular, I'd like to know for starters:
0. What form of replication are you using? Bucardo? Slony? Pgpool? Londiste? Mammoth? Hot-standby? Warm-standby? ...
1. Did it ever work?
2. If so, what changed? (configuration, upgrades, network, ???)
3. Are all machines on the same version?
4. Have you done any upgrades? If so, did you follow all the special notes regarding each upgrade? Occasionally minor upgrades require steps beyond simply replacing the binary and at times those have involved replication issues.
5. Anything of interest in the logs on the master or any of the standbys? Be sure sufficient logging is enabled.
Cheers,
Steve
Click here to report this email as spam.