Thread: New and interesting replication issues with 9.2.8 sync rep
Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A --> B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'm trying to produce a test case ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: > Just got a report of a replication issue with 9.2.8 from a community member: > > Here's the sequence: > > 1) A --> B (sync rep) > > 2) Shut down B > > 3) Shut down A > > 4) Start up B as a master > > 5) Start up A as sync replica of B > > 6) A successfully joins B as a sync replica, even though its transaction > log is 1016 bytes *ahead* of B. > > 7) Transactions written to B all hang > > 8) Xlog on A is now corrupt, although the database itself is OK This is fundamentally borked practice. > Now, the above sequence happened because of the user misunderstanding > what sync rep really means. However, A should not have been able to > connect with B in replication mode, especially in sync rep mode; that > should have failed. Any thoughts on why it didn't? I'd guess that B, while starting up, has written further WAL records bringing it further ahead of A. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/03/2014 01:07 AM, Andres Freund wrote: > On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: >> Just got a report of a replication issue with 9.2.8 from a community member: >> >> Here's the sequence: >> >> 1) A --> B (sync rep) >> >> 2) Shut down B >> >> 3) Shut down A >> >> 4) Start up B as a master >> >> 5) Start up A as sync replica of B >> >> 6) A successfully joins B as a sync replica, even though its transaction >> log is 1016 bytes *ahead* of B. >> >> 7) Transactions written to B all hang >> >> 8) Xlog on A is now corrupt, although the database itself is OK > > This is fundamentally borked practice. > >> Now, the above sequence happened because of the user misunderstanding >> what sync rep really means. However, A should not have been able to >> connect with B in replication mode, especially in sync rep mode; that >> should have failed. Any thoughts on why it didn't? > > I'd guess that B, while starting up, has written further WAL records > bringing it further ahead of A. Apparently not; from what I've seen pg_stat_replication even *shows* that the replica is ahead of the master. Futher, Postgres should have recognized that there was a timeline branch point before A's last record, no? I'm working on getting permission to access the DB files. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: > On 05/03/2014 01:07 AM, Andres Freund wrote: > > On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: > >> Just got a report of a replication issue with 9.2.8 from a community member: > >> > >> Here's the sequence: > >> > >> 1) A --> B (sync rep) > >> > >> 2) Shut down B > >> > >> 3) Shut down A > >> > >> 4) Start up B as a master > >> > >> 5) Start up A as sync replica of B > >> > >> 6) A successfully joins B as a sync replica, even though its transaction > >> log is 1016 bytes *ahead* of B. > >> > >> 7) Transactions written to B all hang > >> > >> 8) Xlog on A is now corrupt, although the database itself is OK > > > > This is fundamentally borked practice. > > > >> Now, the above sequence happened because of the user misunderstanding > >> what sync rep really means. However, A should not have been able to > >> connect with B in replication mode, especially in sync rep mode; that > >> should have failed. Any thoughts on why it didn't? > > > > I'd guess that B, while starting up, has written further WAL records > > bringing it further ahead of A. > > Apparently not; from what I've seen pg_stat_replication even *shows* > that the replica is ahead of the master. Futher, Postgres should have > recognized that there was a timeline branch point before A's last > record, no? There wasn't any timeline increase because - as far as I understand the above - there wasn't any promotion. The cluster was shut down and recovery.conf was created/removed respectively. To me this is a operator error. We could try to defend against it more vigorously, but thats's hard to do without breaking actual usecases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/05/2014 10:25 AM, Andres Freund wrote: > On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: >> On 05/03/2014 01:07 AM, Andres Freund wrote: >>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: >>>> Just got a report of a replication issue with 9.2.8 from a community member: >>>> >>>> Here's the sequence: >>>> >>>> 1) A --> B (sync rep) >>>> >>>> 2) Shut down B >>>> >>>> 3) Shut down A >>>> >>>> 4) Start up B as a master >>>> >>>> 5) Start up A as sync replica of B >>>> >>>> 6) A successfully joins B as a sync replica, even though its transaction >>>> log is 1016 bytes *ahead* of B. >>>> >>>> 7) Transactions written to B all hang >>>> >>>> 8) Xlog on A is now corrupt, although the database itself is OK >>> >>> This is fundamentally borked practice. >>> >>>> Now, the above sequence happened because of the user misunderstanding >>>> what sync rep really means. However, A should not have been able to >>>> connect with B in replication mode, especially in sync rep mode; that >>>> should have failed. Any thoughts on why it didn't? >>> >>> I'd guess that B, while starting up, has written further WAL records >>> bringing it further ahead of A. >> >> Apparently not; from what I've seen pg_stat_replication even *shows* >> that the replica is ahead of the master. Futher, Postgres should have >> recognized that there was a timeline branch point before A's last >> record, no? > > There wasn't any timeline increase because - as far as I understand the > above - there wasn't any promotion. The cluster was shut down and > recovery.conf was created/removed respectively. Ah, oops, left out a step. B was promoted. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-05-05 10:30:17 -0700, Josh Berkus wrote: > On 05/05/2014 10:25 AM, Andres Freund wrote: > > On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: > >> On 05/03/2014 01:07 AM, Andres Freund wrote: > >>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: > >>>> Just got a report of a replication issue with 9.2.8 from a community member: > >>>> > >>>> Here's the sequence: > >>>> > >>>> 1) A --> B (sync rep) > >>>> > >>>> 2) Shut down B > >>>> > >>>> 3) Shut down A > >>>> > >>>> 4) Start up B as a master > >>>> > >>>> 5) Start up A as sync replica of B > >>>> > >>>> 6) A successfully joins B as a sync replica, even though its transaction > >>>> log is 1016 bytes *ahead* of B. > >>>> > >>>> 7) Transactions written to B all hang > >>>> > >>>> 8) Xlog on A is now corrupt, although the database itself is OK > >>> > >>> This is fundamentally borked practice. > >>> > >>>> Now, the above sequence happened because of the user misunderstanding > >>>> what sync rep really means. However, A should not have been able to > >>>> connect with B in replication mode, especially in sync rep mode; that > >>>> should have failed. Any thoughts on why it didn't? > >>> > >>> I'd guess that B, while starting up, has written further WAL records > >>> bringing it further ahead of A. > >> > >> Apparently not; from what I've seen pg_stat_replication even *shows* > >> that the replica is ahead of the master. That's the shutdown record from A that I've talked about. > Futher, Postgres should have > >> recognized that there was a timeline branch point before A's last > >> record, no? > > > > There wasn't any timeline increase because - as far as I understand the > > above - there wasn't any promotion. The cluster was shut down and > > recovery.conf was created/removed respectively. > > Ah, oops, left out a step. B was promoted. Still a user error. You need to reclone. Depending on how archiving and the target timeline was configured the timeline increase won't be treated as an error... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/05/2014 10:53 AM, Andres Freund wrote: > Still a user error. You need to reclone. > > Depending on how archiving and the target timeline was configured the > timeline increase won't be treated as an error... Andres and I hashed this out on IRC. The basic problem was that I was relying on pg_stat_replication to point out when a successful replication connection was established. However, he pointed out cases where pg_stat_replication will report sync or streaming even though replication has failed due to differences in WAL position. That appears to be what happened here. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com