Re: Missing important information in backup.sgml - Mailing list pgsql-docs
From | Gunnar \"Nick\" Bluth |
---|---|
Subject | Re: Missing important information in backup.sgml |
Date | |
Msg-id | a5a614b8-9ce2-ae3b-1141-c09e391cbba2@pro-open.de Whole thread Raw |
In response to | Re: Missing important information in backup.sgml (Kevin Grittner <kgrittn@gmail.com>) |
Responses |
Re: Missing important information in backup.sgml
Re: Missing important information in backup.sgml Re: Missing important information in backup.sgml |
List | pgsql-docs |
Am 23.11.2016 um 20:21 schrieb Kevin Grittner: > On Wed, Nov 23, 2016 at 12:24 PM, Gunnar "Nick" Bluth > <gunnar.bluth@pro-open.de> wrote: > >> mentions Stephen's >> remarks on rsync (although to get actual _data loss_, you'd have to have >> a power outage in the DC caused by your PG server exploding... ;-). > > I have seen power loss between the UPS and a server; including a > tech tripping on the power cord. I have also seen servers abruptly > shut down due to high temperatures in spite of having a UPS. I > have also seen an OS bug lock up a system such that it was > impossible to get a clean shutdown before having to cycle power to > recover. > > No explosion needed. > > If you value the data in your database you should assume that the > OS could go down at any instant without proper shutdown, and that > your storage system(s) could be lost without warning at any time. Kevin, all, I've been in this business for 15 years, and had my share of outages. The worst case being an AC service guy pushing the big red button next to the DC entrance, assuming it was the light switch... It's not like I've not gone through the possible scenarios in my head before writing such a broad statement. Let me explain. Assertions (that I take as givens for anyone valueing his data...): - you have decent HW (BBU controller, HDD cache off, ECC RAM, redundant PSUs, ...) - you have a decent DC (UPS, AC, ...) - you use a single DB server and/or no (synchronous) replication in place - your archive server is in the same DC (potentially the same machine as the DB server) - (in case of SAN) your storage correctly reports when it has written to disk/BBU cache - your OS (and/or archive_script) does not report RC=0 before all data has been _transmitted_ (think MongoDB... ;-) - (for the sake of completeness) fsync=on for PG Now, what could happen is a) complete DC power outage b) outage of DB server c) outage of archive server (or the network connection to it) d) outage of storage system e) complete DC outage caused by your DB server vanishing (burning down, exploding, melting, ...), f) a complete _loss_ of the DC (atomar strike, plane crash, ...) In case a), your DB server would have fsync'd all committed transactions => no _data_ loss, but your _archive_ is potentially incomplete. In case b), the same applies, but your archive should be intact. In case c), the archiver would retry until your archiving server comes back online => no _data_ loss, no _archive_ loss. In case d), see a), if you're lucky b) In case e), you'd have lost your DB _and_ your archive may be incomplete. In case f), your f)....d anyway (oh, the coincidence! ;-). Protecting yourself from case f) will involve a 2nd (3rd, ...) DC (or some cloud thingie) anyway. In my experience, users that do have more than one DC also have a policy in place saying that backups (which archive logs would probably be counted as) have to be placed in a different DC. So, losing actual _data_ is unlikely (at least from the archiving point of view...), but not explicitly fsync'ing the archive _may_ lead to incomplete archives. Which is exactly what I tried to point out by "[...], rendering your archive incomplete in case of a power outage". Am I missing something? P.S.: just to point that out... my patch does _not_ mention exploding servers ;-) Cheers, -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de _____________________________________________________________ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne
Attachment
pgsql-docs by date: