sync rep and smart shutdown - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | sync rep and smart shutdown |
Date | |
Msg-id | BANLkTi=W8OrvqLHS+suU8R2b_rhFaqeEaw@mail.gmail.com Whole thread Raw |
Responses |
Re: sync rep and smart shutdown
|
List | pgsql-hackers |
There is an open item for synchronous replication and smart shutdown, with a link to here: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01391.php The issue is not straightforward, however, so I want to get some broader input before proceeding. In short, the problem is that if synchronous replication is in use, no standbys are connected, and a smart shutdown is requested, any future commits will wait for a wake-up that will never come, because by that point postmaster is no longer accepting connections - thus no standby can reconnect to release waiters. Or, if there is a standby connected when the smart shutdown is requested, but it subsequently gets disconnected, it won't be able to reconnect, and again all waiters will get stuck. There are a couple of plausible ways to proceed here: 1. Do nothing. If this happens to you, you will need to request fast or immediate shutdown to get the system unstuck. Since it's pretty easy for this to happen already anyway (all you need is one connection to sit open doing nothing), most people probably already have provision for this and likely wouldn't be terribly inconvenienced by one more corner case. On the flip side, I would rather that we were moving in the direction of making it more likely for smart shutdown to actually shut down the system, rather than less likely. 2. When a smart shutdown is initiated, shut off synchronous replication. This definitely makes sure you won't get stuck waiting for sync rep, but on the other hand you probably configured sync rep because you wanted, uh, sync rep. Or alternatively, continue to allow sync rep for as long as there is a sync standby connected, but if the last sync standby drops off then shut it off. 3. Accept new replication connections even when the system is undergoing a smart shutdown. This is the approach that the above-linked patch tries to take, and it seems superficially sensible, but it doesn't really work. Currently, once a shutdown has been initiated and any on-line backup has been stopped, we stop creating regular backends; we instead only create dead-end backends that just return an error message and exit. Once no regular backends remain, we then stop accepting connections AT ALL and wait for the dead end backends to drain out. What this patch proposes to do (though it isn't real clear from the way it's written) is continue creating regular backends but boot out all but superuser and replication connections as soon as possible. However, that misses the reason why the current code works the way that it does: to make sure that even in the face of a continuing stream of connection requests, we actually eventually manage to stop talking and shut down. Basically, this patch would fix the smart-shutdown-sync-rep interaction at the expense of making smart shutdown considerably more fragile in other cases, which does not seem like a good trade-off. AFAICT, this whole approach is doomed to failure. Anyone else have an idea or opinion? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: