Re: pg_rewind in contrib - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: pg_rewind in contrib |
Date | |
Msg-id | 548FFD5D.80703@vmware.com Whole thread Raw |
In response to | Re: pg_rewind in contrib (Satoshi Nagayasu <snaga@uptime.jp>) |
Responses |
Re: pg_rewind in contrib
|
List | pgsql-hackers |
On 12/16/2014 11:23 AM, Satoshi Nagayasu wrote: > Hi, > > On 2014/12/12 23:13, Heikki Linnakangas wrote: > > Hi, > > > > I'd like to include pg_rewind in contrib. I originally wrote it as an > > external project so that I could quickly get it working with the > > existing versions, and because I didn't feel it was quite ready for > > production use yet. Now, with the WAL format changes in master, it is a > > lot more maintainable than before. Many bugs have been fixed since the > > first prototypes, and I think it's fairly robust now. > > > > I propose that we include pg_rewind in contrib/ now. Attached is a patch > > for that. It just includes the latest sources from the current pg_rewind > > repository at https://github.com/vmware/pg_rewind. It is released under > > the PostgreSQL license. > > > > For those who are not familiar with pg_rewind, it's a tool that allows > > repurposing an old master server as a new standby server, after > > promotion, even if the old master was not shut down cleanly. That's a > > very often requested feature. > > I'm looking into pg_rewind with a very first scenario. > My scenario is here. > > https://github.com/snaga/pg_rewind_test/blob/master/pg_rewind_test.sh > > At least, I think a file descriptor "srcf" should be closed before > exiting copy_file_range(). I got "can't open file" error with > "too many open file" while running pg_rewind. > > ------------------------------------------------ > diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c > index bea1b09..5a8cc8e 100644 > --- a/contrib/pg_rewind/copy_fetch.c > +++ b/contrib/pg_rewind/copy_fetch.c > @@ -280,6 +280,8 @@ copy_file_range(const char *path, off_t begin, off_t > end, bool trunc) > write_file_range(buf, begin, readlen); > begin += readlen; > } > + > + close(srcfd); > } > > /* > ------------------------------------------------ Yep, good catch. I pushed a fix to the pg_rewind repository at github. > And I have one question here. > > pg_rewind assumes that the source PostgreSQL has, at least, one > checkpoint after getting promoted. I think the target timeline id > in the pg_control file to be read is only available after the first > checkpoint. Right? Yes, it does assume that the source server (= old standby, new master) has had at least one checkpoint after promotion. It probably should be more explicit about it: If there hasn't been a checkpoint, you will currently get an error "source and target cluster are both on the same timeline", which isn't very informative. I assume that by "target timeline ID" you meant the timeline ID of the source server, i.e. the timeline that the target server should be rewound to. - Heikki
pgsql-hackers by date: