Re: Sync Rep for 2011CF1 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Sync Rep for 2011CF1 |
Date | |
Msg-id | AANLkTikG8WMhOocX9AYsRHYPc-PgxPaG6miFDD9QH3i1@mail.gmail.com Whole thread Raw |
In response to | Re: Sync Rep for 2011CF1 (Aidan Van Dyk <aidan@highrise.ca>) |
Responses |
Re: Sync Rep for 2011CF1
Re: Sync Rep for 2011CF1 |
List | pgsql-hackers |
On Fri, Jan 21, 2011 at 1:09 PM, Aidan Van Dyk <aidan@highrise.ca> wrote: > On Fri, Jan 21, 2011 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Fri, Jan 21, 2011 at 12:23 PM, Aidan Van Dyk <aidan@highrise.ca> wrote: >>>> When no sync slave is connected, yes, I want to stop things hard. >> >>> What you're proposing is to fail things earlier than absolutely >>> necessary (when they try to XLOG, rather than at commit) but still >>> later than what I think Simon is proposing (not even letting them log >>> in). >> >> I can't see a reason to disallow login, because read-only transactions >> can still run in such a situation --- and, indeed, might be fairly >> essential if you need to inspect the database state on the way to fixing >> the replication problem. (Of course, we've already had the discussion >> about it being a terrible idea to configure replication from inside the >> database, but that doesn't mean there might not be views or status you >> would wish to look at.) > > And just disallowing new logins is probably not even enough, because > it allows current logged in clients "forward progress", leading > towards an eventual hang (with now committed data on the master). > > Again, I'm trying to stop "forward progress" as soon as possible when > a sync slave isn't replicating. And I'ld like clients to fail with > errors sooner (hopefully they get to the commit point) rather than > accumulate the WAL synced to the master and just wait at the commit. > > So I think that's a more complete picture of my quick "not do anything > with no synchronous slave replicating" that I think was what led to > the no-login approach. Well, stopping all WAL activity with an error sounds *more* reasonable than refusing all logins, but I'm not personally sold on it. For example, a brief network disruption on the connection between master and standby would cause the master to grind to a halt... and then almost immediately resume operations. More generally, if you have short-running transactions, there's not much difference between wait-at-commit and wait-at-WAL, and if you have long-running transactions, then wait-at-WAL might be gumming up the works more than necessary. One idea might be to wait both before and after commit. If allow_standalone_primary is off, and a commit is attempted, we check whether there's a slave connected, and if not, wait for one to connect. Then, we write and sync the commit WAL record. Next, we wait for the WAL to be ack'd. Of course, the standby might disappear between the first check and the second, but it would greatly reduce the possibility of the master being ahead of the standby after a crash, which might be useful for some people. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: