Thread: New and interesting replication issues with 9.2.8 sync rep

New and interesting replication issues with 9.2.8 sync rep

From

Josh Berkus

Date:

03 May 2014, 01:57:19

Just got a report of a replication issue with 9.2.8 from a community member:

Here's the sequence:

1) A --> B (sync rep)

2) Shut down B

3) Shut down A

4) Start up B as a master

5) Start up A as sync replica of B

6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.

7) Transactions written to B all hang

8) Xlog on A is now corrupt, although the database itself is OK

Now, the above sequence happened because of the user misunderstanding
what sync rep really means.  However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed.  Any thoughts on why it didn't?

I'm trying to produce a test case ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: New and interesting replication issues with 9.2.8 sync rep

From

Andres Freund

Date:

03 May 2014, 08:07:45

On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
> Just got a report of a replication issue with 9.2.8 from a community member:
> 
> Here's the sequence:
> 
> 1) A --> B (sync rep)
> 
> 2) Shut down B
> 
> 3) Shut down A
> 
> 4) Start up B as a master
> 
> 5) Start up A as sync replica of B
> 
> 6) A successfully joins B as a sync replica, even though its transaction
> log is 1016 bytes *ahead* of B.
> 
> 7) Transactions written to B all hang
> 
> 8) Xlog on A is now corrupt, although the database itself is OK

This is fundamentally borked practice.

> Now, the above sequence happened because of the user misunderstanding
> what sync rep really means.  However, A should not have been able to
> connect with B in replication mode, especially in sync rep mode; that
> should have failed.  Any thoughts on why it didn't?

I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: New and interesting replication issues with 9.2.8 sync rep

From

Josh Berkus

Date:

05 May 2014, 17:16:38

On 05/03/2014 01:07 AM, Andres Freund wrote:
> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
>> Just got a report of a replication issue with 9.2.8 from a community member:
>>
>> Here's the sequence:
>>
>> 1) A --> B (sync rep)
>>
>> 2) Shut down B
>>
>> 3) Shut down A
>>
>> 4) Start up B as a master
>>
>> 5) Start up A as sync replica of B
>>
>> 6) A successfully joins B as a sync replica, even though its transaction
>> log is 1016 bytes *ahead* of B.
>>
>> 7) Transactions written to B all hang
>>
>> 8) Xlog on A is now corrupt, although the database itself is OK
> 
> This is fundamentally borked practice.
> 
>> Now, the above sequence happened because of the user misunderstanding
>> what sync rep really means.  However, A should not have been able to
>> connect with B in replication mode, especially in sync rep mode; that
>> should have failed.  Any thoughts on why it didn't?
> 
> I'd guess that B, while starting up, has written further WAL records
> bringing it further ahead of A.

Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master.  Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?

I'm working on getting permission to access the DB files.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: New and interesting replication issues with 9.2.8 sync rep

From

Andres Freund

Date:

05 May 2014, 17:25:14

On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
> On 05/03/2014 01:07 AM, Andres Freund wrote:
> > On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
> >> Just got a report of a replication issue with 9.2.8 from a community member:
> >>
> >> Here's the sequence:
> >>
> >> 1) A --> B (sync rep)
> >>
> >> 2) Shut down B
> >>
> >> 3) Shut down A
> >>
> >> 4) Start up B as a master
> >>
> >> 5) Start up A as sync replica of B
> >>
> >> 6) A successfully joins B as a sync replica, even though its transaction
> >> log is 1016 bytes *ahead* of B.
> >>
> >> 7) Transactions written to B all hang
> >>
> >> 8) Xlog on A is now corrupt, although the database itself is OK
> > 
> > This is fundamentally borked practice.
> > 
> >> Now, the above sequence happened because of the user misunderstanding
> >> what sync rep really means.  However, A should not have been able to
> >> connect with B in replication mode, especially in sync rep mode; that
> >> should have failed.  Any thoughts on why it didn't?
> > 
> > I'd guess that B, while starting up, has written further WAL records
> > bringing it further ahead of A.
> 
> Apparently not; from what I've seen pg_stat_replication even *shows*
> that the replica is ahead of the master.  Futher, Postgres should have
> recognized that there was a timeline branch point before A's last
> record, no?

There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.

To me this is a operator error. We could try to defend against it more
vigorously, but thats's hard to do without breaking actual usecases.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: New and interesting replication issues with 9.2.8 sync rep

From

Josh Berkus

Date:

05 May 2014, 17:30:26

On 05/05/2014 10:25 AM, Andres Freund wrote:
> On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
>> On 05/03/2014 01:07 AM, Andres Freund wrote:
>>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
>>>> Just got a report of a replication issue with 9.2.8 from a community member:
>>>>
>>>> Here's the sequence:
>>>>
>>>> 1) A --> B (sync rep)
>>>>
>>>> 2) Shut down B
>>>>
>>>> 3) Shut down A
>>>>
>>>> 4) Start up B as a master
>>>>
>>>> 5) Start up A as sync replica of B
>>>>
>>>> 6) A successfully joins B as a sync replica, even though its transaction
>>>> log is 1016 bytes *ahead* of B.
>>>>
>>>> 7) Transactions written to B all hang
>>>>
>>>> 8) Xlog on A is now corrupt, although the database itself is OK
>>>
>>> This is fundamentally borked practice.
>>>
>>>> Now, the above sequence happened because of the user misunderstanding
>>>> what sync rep really means.  However, A should not have been able to
>>>> connect with B in replication mode, especially in sync rep mode; that
>>>> should have failed.  Any thoughts on why it didn't?
>>>
>>> I'd guess that B, while starting up, has written further WAL records
>>> bringing it further ahead of A.
>>
>> Apparently not; from what I've seen pg_stat_replication even *shows*
>> that the replica is ahead of the master.  Futher, Postgres should have
>> recognized that there was a timeline branch point before A's last
>> record, no?
> 
> There wasn't any timeline increase because - as far as I understand the
> above - there wasn't any promotion. The cluster was shut down and
> recovery.conf was created/removed respectively.

Ah, oops, left out a step.  B was promoted.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: New and interesting replication issues with 9.2.8 sync rep

From

Andres Freund

Date:

05 May 2014, 17:53:47

On 2014-05-05 10:30:17 -0700, Josh Berkus wrote:
> On 05/05/2014 10:25 AM, Andres Freund wrote:
> > On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
> >> On 05/03/2014 01:07 AM, Andres Freund wrote:
> >>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
> >>>> Just got a report of a replication issue with 9.2.8 from a community member:
> >>>>
> >>>> Here's the sequence:
> >>>>
> >>>> 1) A --> B (sync rep)
> >>>>
> >>>> 2) Shut down B
> >>>>
> >>>> 3) Shut down A
> >>>>
> >>>> 4) Start up B as a master
> >>>>
> >>>> 5) Start up A as sync replica of B
> >>>>
> >>>> 6) A successfully joins B as a sync replica, even though its transaction
> >>>> log is 1016 bytes *ahead* of B.
> >>>>
> >>>> 7) Transactions written to B all hang
> >>>>
> >>>> 8) Xlog on A is now corrupt, although the database itself is OK
> >>>
> >>> This is fundamentally borked practice.
> >>>
> >>>> Now, the above sequence happened because of the user misunderstanding
> >>>> what sync rep really means.  However, A should not have been able to
> >>>> connect with B in replication mode, especially in sync rep mode; that
> >>>> should have failed.  Any thoughts on why it didn't?
> >>>
> >>> I'd guess that B, while starting up, has written further WAL records
> >>> bringing it further ahead of A.
> >>
> >> Apparently not; from what I've seen pg_stat_replication even *shows*
> >> that the replica is ahead of the master.

That's the shutdown record from A that I've talked about.

>  Futher, Postgres should have
> >> recognized that there was a timeline branch point before A's last
> >> record, no?
> > 
> > There wasn't any timeline increase because - as far as I understand the
> > above - there wasn't any promotion. The cluster was shut down and
> > recovery.conf was created/removed respectively.
> 
> Ah, oops, left out a step.  B was promoted.

Still a user error. You need to reclone.

Depending on how archiving and the target timeline was configured the
timeline increase won't be treated as an error...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: New and interesting replication issues with 9.2.8 sync rep

From

Josh Berkus

Date:

05 May 2014, 18:52:18

On 05/05/2014 10:53 AM, Andres Freund wrote:
> Still a user error. You need to reclone.
> 
> Depending on how archiving and the target timeline was configured the
> timeline increase won't be treated as an error...

Andres and I hashed this out on IRC.  The basic problem was that I was
relying on pg_stat_replication to point out when a successful
replication connection was established.  However, he pointed out cases
where pg_stat_replication will report sync or streaming even though
replication has failed due to differences in WAL position.  That appears
to be what happened here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com