Thread: Fast Primary shutdown only after wal_sender_timeout
Hi, I'm doing some failover tests on a 2-node streaming replication cluster and shutting down the primary with 'pg_ctl -m fast' results in a timeout of 50-60 seconds, pg_ctl returns only after the latter message: <71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG: database system is shut down <62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28 10:02:27.963 CEST-581305b9.f592-transid:0>LOG: terminating walsender process due to replication timeout If I set wal_sender_timeout (it has been commented out so far, i.e. set to 60 seconds) to something smaller like 10 seconds, I get a 10 second delay. There are no users logged into either primary or standby, nor is there any other activity. The hot_standby_feedback parameter is set to 'on'. I would assume that the replication connection is shut down along with the backends, but this seems to be not the case, is this expected? This is on 9.5.4, self-compiled. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Le 28 octobre 2016 12:40:24 GMT+02:00, Michael Banck <michael.banck@credativ.de> a écrit : >Hi, > >I'm doing some failover tests on a 2-node streaming replication cluster >and shutting down the primary with 'pg_ctl -m fast' results in a >timeout >of 50-60 seconds, pg_ctl returns only after the latter message: > ><71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG: >database system is shut down ><62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28 >10:02:27.963 >CEST-581305b9.f592-transid:0>LOG: terminating walsender process due to >replication timeout > >If I set wal_sender_timeout (it has been commented out so far, i.e. set >to 60 seconds) to something smaller like 10 seconds, I get a 10 second >delay. There are no users logged into either primary or standby, nor is >there any other activity. The hot_standby_feedback parameter is set to >'on'. > >I would assume that the replication connection is shut down along with >the backends, but this seems to be not the case, is this expected? Yes, in normal situation. But the master ensure everything has been replicated to the connected standby before shutting downthe connections. It it hits wal_sender_timeout, maybe you have a badly disconnected standby not detected by the master? Maybe a secondaryIP address moved away from the master before its shutdown ? > >This is on 9.5.4, self-compiled. > > >Michael /ioguix