Re: Missing important information in backup.sgml - Mailing list pgsql-docs
From | Gunnar \"Nick\" Bluth |
---|---|
Subject | Re: Missing important information in backup.sgml |
Date | |
Msg-id | 80e18b98-9b72-ddee-2a34-76302e5d8a0b@pro-open.de Whole thread Raw |
In response to | Re: Missing important information in backup.sgml (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: Missing important information in backup.sgml
|
List | pgsql-docs |
Am 16.11.2016 um 15:36 schrieb Stephen Frost: > Gunnar, all, > > * Gunnar "Nick" Bluth (gunnar.bluth.extern@elster.de) wrote: >> Am 16.11.2016 um 11:37 schrieb Gunnar "Nick" Bluth: >>> I ran into this issue (see patch) a few times over the past years, and >>> tend to forget it again (sigh!). Today I had to clean up a few hundred >>> GB of unarchived WALs, so I decided to write a patch for the >>> documentation this time. >> >> Uhm, well, the actual problem was a stale replication slot... and >> tomatoes on my eyes, it seems ;-/. Ashes etc.! >> >> However, I still think a warning on (esp. rsync's) RCs >= 128 is worth >> considering (see -v2 attached). > > Frankly, I wouldn't suggest including such wording as it would imply > that using a bare rsync command is an acceptable configuration of > archive_command. It isn't. At the very least, a bare rsync does > nothing to ensure that the WAL has been fsync'd to permanent storage > before returning, leading to potential data loss due to the WAL > segment being removed by PG before the new segment has been permanently > stored. I for myself deem a UPS-backed server in a different DC a pretty good starting point, and I reckon many others do as well... obviously it's not a belt and bracers solution, but my guess would be that > 90% of users have something similar in place, many of them actually using rsync (or scp) one way or the other (or, think WAL-E et. al., how do you force an fsync on AWS?!?). In environments where there's a risk of the WAL segment being overwritten before that target server has fsync'd, heck, yeah, you're right. But then you'd probably have something quite sophisticated in place, and hate to see allegedly random _FATAL_ errors that are _not documented outside the source code_ even more. Esp. when you can't tell for sure (from the docs) if archiving your WAL segment will be retried or not. > The PG documentation around archive command is, at best, a starting > point for individuals who wish to implement their own proper backup > solution, not as examples of good practice for production environments. True. Which doesn't mean there's no room for more hints, like "ok, we throw a FATAL error sometimes, but they're not really a problem, you know, it's just external software that basically everyone uses at one point or the other doing odd things sometimes" ;-). Alas, I've been hunting a red herring today, cause when you find your pg_xlog cluttered with old files _and_ see FATAL archiving messages in your logs, your first thought is not "there's prolly a replication slot left over", but "uh oh, those archive_command calls failed, so something might be somehow stuck now". I'll try to come up with something more comprehensive, taking your comments into account... > Thanks! > > Stephen Thank you for considering this! ;-) Cheers, -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de _____________________________________________________________ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne
Attachment
pgsql-docs by date: