Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: pg_rewind failure by file deletion in source server |
Date | |
Msg-id | 55BD1788.3090803@iki.fi Whole thread Raw |
In response to | Re: pg_rewind failure by file deletion in source server (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: pg_rewind failure by file deletion in source server
|
List | pgsql-hackers |
On 07/17/2015 06:28 AM, Michael Paquier wrote: > On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> On 06/29/2015 09:44 AM, Michael Paquier wrote: >>>> >>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote: >>>>> >>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps >>>>> it >>>>> would be enough to special-case pg_xlog for now. >>>> >>>> >>>> Well, sure, pg_rewind does not copy the soft links either way. Now it >>>> would be nice to have an option to be able to recreate the soft link >>>> of at least pg_xlog even if it can be scripted as well after a run. >>> >>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In >>> any non-trivial scenarios, just copying all the files from pg_xlog isn't >>> enough anyway, and you need to set up a recovery.conf after running >>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the >>> WAL. So you can argue that by not copying pg_xlog automatically, we're >>> actually doing a favour to the DBA, by forcing him to set up the >>> recovery.conf file correctly. Because if you just test simple scenarios >>> where not much time has passed between the failover and running pg_rewind, >>> it might be enough to just copy all the WAL currently in pg_xlog, but it >>> would not be enough if more time had passed and not all the required WAL is >>> present in pg_xlog anymore. And by not copying the WAL, we can avoid some >>> copying, as restore_command or streaming replication will only copy what's >>> needed, while pg_rewind would copy all WAL it can find the target's data >>> directory. >>> >>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog >>> option. It would be nice to also add an optional --xlog option to pg_rewind, >>> but with pg_rewind it's possible that all the required WAL isn't present in >>> the pg_xlog directory anymore, so you wouldn't always achieve the same >>> effect of making the backup self-contained. >>> >>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory >>> in both the source and the target. >> >> If pg_xlog is simply ignored, some old WAL files may remain in target server. >> Don't these old files cause the subsequent startup of target server as new >> standby to fail? That is, it's the case where the WAL file with the same name >> but different content exist both in target and source. If that's harmfull, >> pg_rewind also should remove the files in pg_xlog of target server. > > This would reduce usability. The rewound node will replay WAL from the > previous checkpoint where WAL forked up to the minimum recovery point > of source node where pg_rewind has been run. Hence if we remove > completely the contents of pg_xlog we'd lose a portion of the logs > that need to be replayed until timeline is switched on the rewound > node when recovering it (while streaming from the promoted standby, > whatever). I don't really see why recycled segments would be a > problem, as that's perhaps what you are referring to, but perhaps I am > missing something. Hmm. My thinking was that you need to set up restore_command or primary_conninfo anyway, to fetch the old WAL, so there's no need to copy any WAL. But there's a problem with that: you might have WAL files in the source server that haven't been archived yet, and you need them to recover the rewound target node. That's OK for libpq mode, I think as the server is still running and presumably and you can fetch the WAL with streaming replication, but for copy-mode, that's not a good assumption. You might be relying on a WAL archive, and the file might not be archived yet. Perhaps it's best if we copy all the WAL files from source in copy-mode, but not in libpq mode. Regarding old WAL files in the target, it's probably best to always leave them alone. They should do no harm, and as a general principle it's best to avoid destroying evidence. It'd be nice to get some fix for this for alpha2, so I'll commit a fix to do that on Monday, unless we come to a different conclusion before that. - Heikki
pgsql-hackers by date: