Re: run pg_rewind on an uncleanly shut down cluster. - Mailing list pgsql-hackers
From | Oleksii Kliukin |
---|---|
Subject | Re: run pg_rewind on an uncleanly shut down cluster. |
Date | |
Msg-id | F5E5541C-1840-49E6-8F82-F28A31E32CCF@hintbits.com Whole thread Raw |
In response to | Re: run pg_rewind on an uncleanly shut down cluster. (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: run pg_rewind on an uncleanly shut down cluster.
|
List | pgsql-hackers |
On 06 Oct 2015, at 08:58, Michael Paquier <michael.paquier@gmail.com> wrote:On Tue, Oct 6, 2015 at 12:41 AM, Oleksii Kliukin <alexk@hintbits.com> wrote:pg_rewind -D postgresql0 --source-server="host=127.0.0.1 port=5433dbname=postgres"The servers diverged at WAL position 0/3000060 on timeline 1.could not open file "data/postgresql0/pg_xlog/000000010000000000000002":No such file or directoryNote that this problem happens not 100% of time during the tests,sometimes pg_rewind can actually rewind the former master.I don't think that there is any actual reason preventing us fromrewinding a node that has its state in pg_control set as somethingelse than DB_SHUTDOWNED, the important point here is to be sure thatthe target node is *not* running while pg_rewind is running (perhapspg_rewind should perform an action in the target node to not have itrun, let's say that it creates a fake postgresql.conf with invaliddata and renames the existing one). Checking pg_control makes thingseasier though, there is no need to rely on external binaries like"pg_ctl status" or some parsing of postmaster.pid with kill(pid, 0)for example.
Does pg_rewind actually rely on the cluster being rewound to finish recovery?
If not, than it would be a good idea to add —force flag to force the pg_rewind to ignore the state check, as you suggested in this thread:
http://www.postgresql.org/message-id/flat/CAF8Q-Gw1HBKzpSEVtotLg=DR+Ee-6q59qQfhY5tor3FYAenyrA@mail.gmail.com#CAF8Q-Gw1HBKzpSEVtotLg=DR+Ee-6q59qQfhY5tor3FYAenyrA@mail.gmail.com
I know I can copy the segment back from the archive, but I'd like toavoid putting this logic into the failover tool if possible. Is thereanything we can do to avoid the problem described above, or is there abetter way to bring up the former master after the crash with pg_rewind?Well, for 9.5 (and actually the same applies to the 9.3 and 9.4version on github because I am keeping the previous versionsconsistent with what is in 9.5), I guess no.This missing segment is going to be needed in any case because thelist of blocks modified needs to be found, hence the question is "howcan pg_rewind guess where a WAL segment missing from the target'spg_xlog is located?". And there are multiple answers:- An archive path, then let's add an option to pg_rewind to add apath, though this needs the archive path to be mounted locally, andusually that's not the case.- An existing node of the cluster, perhaps the segment is stillpresent on another standby node that already replayed it, though thiswould need an extra node.- The source node itself, if we are lucky the missing segment createdbefore WAL forked is still there. It may not be there though if it hasalready been recycled.At the end it seems to me that this is going to need some extraoperation by the failover tool or the system administrator either way,and that any additional logic to check where this segment is locatedis never going to satisfy completely the failover use cases. Hence Iwould keep just pg_rewind out of that.
Well, checking the source node looks like an option that does not require providing any additional information by DBA, as the connection string or the path to the data dir is already there. It would be nice if pg_rewind could fetch WAL from the given restore_command though, or even use the command already there in recovery.conf (if the node being recovered is a replica, which I guess is a pretty common case).
Anyway, thank you for describing the issue. In my case, it seems I solved it by removing the files from the archive_status directory of the former master (the node being rewound). This makes PostgreSQL forget that it has to remove an already archived (but still required for pg_rewind) segment (I guess it does it during stop when the checkpoint is issued). Afterwards, postgres starts it in a single user mode with archive_command=false and archive_mode=on, to make sure no segments are archived/removed, and stopped right afterwards with:
postgres --single -D . -c "max_replication_slots=5" -c "wal_level=hot_standby" -c "wal_log_hints=on" -c "archive_mode=on" -c "archive_command=false” postgres </dev/null
(the other options are taken from postmaster.ops).
Afterwards, pg_rewind runs on the cluster without any noticeable issues. Since the node is not going to continue as a master and the contents of pg_xlog/archive_status is changed after pg_rewind anyway, I don’t think any data is lost after initial removal of archive_status files.
Kind regards,
--
Oleksii
pgsql-hackers by date: