Thread: Replication Lag

Replication Lag

From

"Chatha, Karan (CMG-Atlanta)"

Date:

21 March 2014, 13:00:48

We are having replication lag issues in our production environment. We are on postgres 9.015.

We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce

this by doing transactions on the master and we see that transactions are not coming over to slave.

Is there any way we are hitting a bug?

KARAN CHATHA | Manager, Data Services | CMG Technology

karan.chatha@coxinc.com | p: 678-645-4083| m: 404-713-1368

Re: Replication Lag

From

Steve Crawford

Date:

21 March 2014, 19:13:16

On 03/20/2014 07:42 PM, Chatha, Karan (CMG-Atlanta) wrote:

We are having replication lag issues in our production environment. We are on postgres 9.015.

Er. 9.0.15?

We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce
this by doing transactions on the master and we see that transactions are not coming over to slave.

Are you seeing a *lag* in replication or no replication at all?

Is there any way we are hitting a bug?

Possibly but I'm going to guess that the most likely location of the bug is somewhere in your configuration. You need to provide more information. This page is a good guide: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

In particular, I'd like to know for starters:

0. What form of replication are you using? Bucardo? Slony? Pgpool? Londiste? Mammoth? Hot-standby? Warm-standby? ...

1. Did it ever work?

2. If so, what changed? (configuration, upgrades, network, ???)

3. Are all machines on the same version?

4. Have you done any upgrades? If so, did you follow all the special notes regarding each upgrade? Occasionally minor upgrades require steps beyond simply replacing the binary and at times those have involved replication issues.

5. Anything of interest in the logs on the master or any of the standbys? Be sure sufficient logging is enabled.

Cheers,
Steve

Re: Replication Lag

From

Steve Crawford

Date:

22 March 2014, 00:44:15

On 03/21/2014 12:43 PM, Chatha, Karan (CMG-Atlanta) wrote:

1) It was working until March 8

And then what changed? *Anything* that might have happened. Config change, unclean reboot, out of disk, firewall updates, network changes, anything at all...

2) We upgraded Postgres from 9.03 to 9.015 on Feb 19

There are a few items that require special handling between 9.03 and 9.0.15. Did you read all the release notes and make sure that the extra steps were completed or didn't apply to you? (I'm not sure that any directly impact replication but haven't been running anything earlier than 9.1 for quite a while.)

3)      Streaming Replication
4)      Right now we have master on 9.015 and 7 slaves on 9.015 and one slave on 9.0.16
5)      We have full logging enable to syslog

What do the logs tell you? Have you thoroughly examined them both for current messages and anything unusual around the time that the issue appeared?

6) What we see is that there are no loads or io but archives get stuck on one archive. We have
7) max_standby_archive_delay = 60000 # max delay before canceling queries
max_standby_streaming_delay = 60000

It is almost like these values are not being honored.

Cheers,
Steve

Re: Replication Lag

From

"Chatha, Karan (CMG-Atlanta)"

Date:

24 March 2014, 13:31:49

1) It was working until March 8

2) We upgraded Postgres from 9.03 to 9.015 on Feb 19

3) Streaming Replication

4) Right now we have master on 9.015 and 7 slaves on 9.015 and one slave on 9.0.16

5) We have full logging enable to syslog

6) What we see is that there are no loads or io but archives get stuck on one archive. We have

7) max_standby_archive_delay = 60000 # max delay before canceling queries

max_standby_streaming_delay = 60000

It is almost like these values are not being honored.

Thx

KARAN CHATHA | Manager, Data Services | CMG Technology

karan.chatha@coxinc.com | p: 678-645-4083| m: 404-713-1368

From: Steve Crawford [mailto:scrawford@pinpointresearch.com]
Sent: Friday, March 21, 2014 3:13 PM
To: Chatha, Karan (CMG-Atlanta); pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Replication Lag

On 03/20/2014 07:42 PM, Chatha, Karan (CMG-Atlanta) wrote:

We are having replication lag issues in our production environment. We are on postgres 9.015.

Er. 9.0.15?

We have one master and 8 slaves. We don’t see any loads or io on the slaves. All we see is replication lag which we measure in megs. We can reproduce

this by doing transactions on the master and we see that transactions are not coming over to slave.

Are you seeing a *lag* in replication or no replication at all?

Is there any way we are hitting a bug?

Click here to report this email as spam.