Streaming replication with sync slave, but disconnects due to missing WAL segments - Mailing list pgsql-general
From | Mads.Tandrup@schneider-electric.com |
---|---|
Subject | Streaming replication with sync slave, but disconnects due to missing WAL segments |
Date | |
Msg-id | OF87DBD177.6C324ADB-ONC1257B80.00483254-C1257B80.0049C624@apcc.com Whole thread Raw |
Responses |
Re: Streaming replication with sync slave, but disconnects
due to missing WAL segments
Re: Streaming replication with sync slave, but disconnects due to missing WAL segments |
List | pgsql-general |
Hi all I have a question about sync streaming replication. I have 2 postgresql 9.1 servers set up with streaming replication. On the master node the slave is configured as a synchronous standby. I've verified that pg_stat_replication shows sync_state = sync for the slave node. It all seems to work fine. But I have noticed that sometimes when I restore backups created by pg_dump. The slave node will disconnect with the message in the postgresql log: 2013-06-03 13:13:48 GMT 4271 FATAL: could not receive data from WAL stream: SSL connection has been closed unexpectedly 2013-06-03 13:13:53 GMT 4270 LOG: invalid magic number 0000 in log file 15, segment 65, offset 11665408 2013-06-03 13:13:54 GMT 36428 LOG: streaming replication successfully connected to primary 2013-06-03 13:13:54 GMT 36428 FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000000F00000041 has already been removed 2013-06-03 13:13:58 GMT 36458 LOG: streaming replication successfully connected to primary 2013-06-03 13:13:58 GMT 36458 FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000000F00000041 has already been removed On the master I get this in the log file in the same timespan: 2013-06-03 13:13:47 GMT 1471 LOG: checkpoints are occurring too frequently (2 seconds apart) 2013-06-03 13:13:47 GMT 1471 HINT: Consider increasing the configuration parameter "checkpoint_segments". 2013-06-03 13:13:48 GMT 6189 [unknown] FATAL: requested WAL segment 000000010000000F00000041 has already been removed 2013-06-03 13:13:48 GMT 6189 [unknown] LOG: disconnection: session time: 77:37:37.684 user=root database= host=10.216.80.38 port=56114 2013-06-03 13:13:49 GMT 1471 LOG: checkpoints are occurring too frequently (2 seconds apart) 2013-06-03 13:13:49 GMT 1471 HINT: Consider increasing the configuration parameter "checkpoint_segments". 2013-06-03 13:13:51 GMT 1471 LOG: checkpoints are occurring too frequently (2 seconds apart) 2013-06-03 13:13:51 GMT 1471 HINT: Consider increasing the configuration parameter "checkpoint_segments". 2013-06-03 13:13:51 GMT 1468 LOG: received SIGHUP, reloading configuration files 2013-06-03 13:13:51 GMT 1468 LOG: parameter "synchronous_standby_names" removed from configuration file, reset to default 2013-06-03 13:13:53 GMT 1471 LOG: checkpoints are occurring too frequently (2 seconds apart) 2013-06-03 13:13:53 GMT 1471 HINT: Consider increasing the configuration parameter "checkpoint_segments". 2013-06-03 13:13:53 GMT 44063 [unknown] LOG: connection received: host=10.216.80.38 port=34038 2013-06-03 13:13:54 GMT 44063 [unknown] LOG: replication connection authorized: user=root 2013-06-03 13:13:54 GMT 44063 [unknown] FATAL: requested WAL segment 000000010000000F00000041 has already been removed 2013-06-03 13:13:54 GMT 44063 [unknown] LOG: disconnection: session time: 0:00:00.090 user=root database= host=10.216.80.38 port=34038 What I don't understand is how the slave node can miss a WAL segment since it should be sync? Shouldn't sync prevent the server from continuing if the slave is not able to get WAL segments fast enough? I have only noticed it while restoring a database. But the general load on the DB has not been that high, so I'm not sure if it can occur with other workloads. Best regards, Mads
pgsql-general by date: