Re: Proposal: Implement failover on libpq connect level. - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Implement failover on libpq connect level. |
Date | |
Msg-id | CA+TgmoaqUECxFDEyxLNQetw5uCS7FGhU1-851+rZZwogztbMPw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Implement failover on libpq connect level. (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Proposal: Implement failover on libpq connect level.
|
List | pgsql-hackers |
On Tue, Sep 1, 2015 at 1:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > That sort-of ties into what seems to me the main objection to this >> > proposal, namely that there is already a way to do this sort of thing: >> > DNS-based load balancing. All the clients think they connect to >> > db.mycompany.com, but which server they actually get is determined by >> > what IP address the DNS server tells them to use. >> >> But that kinda sucks. I mean, suppose I have three servers, A, B, and >> C. I point db.mycompany.com to A, which is the master; then A dies. >> Under your proposal, whatever script I use to control failover now has >> to change the DNS records to repoint db.mycompany.com to B, my new, >> and newly-promoted, new master. It's quite possible that some >> machines on the network, or some processes, will have the old IP >> address cached, and it may be several minutes before those caches time >> out. In the meantime, I'm down: even if I bounce the application >> servers, they may just try to reconnect to A. > > The solution to this part seems to be to lower the TTL, which seems > easy enough. In theory, yeah. In practice, not all systems obey the TTL, and in my experience, that's actually a fairly common problem. Sometimes the TTL gets enforced separately at multiple levels, so that all of the old records don't go away for 2 or 3 times the TTL, or occasionally completely random intervals of time thoroughly unrelated to the TTL you configured. And that assumes that the guy who controls the DNS server is willing to configure a different TTL for you, which is not always the case. It also assumes that guy is OK granting access to modify DNS records to an automated system running on the database server machines. That may be OK if the database server is THE ONE THING that needs treatment of this type, but if the company supports 50 or 100 services that all need failover handling, suddenly giving all of those things the ability to reconfigure the DNS server sounds like a pretty poor plan. Plus, there may be multiple copies of the DNS server in different geographies, all cloned from a master at the central office. When the central office dies, you lose not only the main database server but also the main DNS server. That's OK, because the backup DNS servers still have copies of all the data from the master ... but you can't make changes until the master is back up. All of these problems can be solved if you're willing to put enough time and energy into it. For example, Akamai has (or had, at the time I worked there) a service that did very robust geographical load-balancing and failover. So you could, like, go buy that, and maybe it would solve your problem. By now, there are probably other companies offering similar services. I have no doubt that similar solutions can be crafted from purely open-source software, and there may very well be great tools available for this that weren't around the last time I worked as a network administrator. But I think it's quite wrong to assume that the infrastructure for this is available and usable everywhere, because in my experience, that's far from the case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: