Re: pg_upgrade resets timeline to 1 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: pg_upgrade resets timeline to 1 |
Date | |
Msg-id | 20150528072721.GA4102649@tornado.leadboat.com Whole thread Raw |
In response to | pg_upgrade resets timeline to 1 (Christoph Berg <myon@debian.org>) |
Responses |
Re: pg_upgrade resets timeline to 1
|
List | pgsql-hackers |
On Wed, May 27, 2015 at 05:40:09PM +0200, Christoph Berg wrote: > commit 4c5e060049a3714dd27b7f4732fe922090edea69 > Author: Bruce Momjian <bruce@momjian.us> > Date: Sat May 16 00:40:18 2015 -0400 > > pg_upgrade: force timeline 1 in the new cluster > > Previously, this prevented promoted standby servers from being upgraded > because of a missing WAL history file. (Timeline 1 doesn't need a > history file, and we don't copy WAL files anyway.) > > Pardon me for starting a fresh thread, but I couldn't find where this > was discussed. > > I've just had trouble getting barman to work again after a 9.1->9.4.2 > upgrade, and I think part of the problem was that the WAL for this > cluster got reset from timeline 2 to 1, which made barman's incoming > WALs processor drop the files, probably because the new filename > 0001... is now "less" than the 0002... before. It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new timeline identifier (TLI) to 1. My testing confirms this for an upgrade from 9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to reproduce your report. Would you verify the versions you used? If you were upgrading from 9.3.x, I _can_ reproduce that. Since the 2015-05-16 commits you cite, pg_upgrade always sets TLI=1. Behavior before those commits depended on the source and destination major versions. PostgreSQL 9.0, 9.1 and 9.2 restored the TLI regardless of source version. PostgreSQL 9.3 and 9.4 restored the TLI when upgrading from 9.3 or 9.4, but they set TLI=1 when upgrading from 9.2 or earlier. (Commit 038f3a0 introduced this inconsistent behavior of 9.3 and later.) The commit you cite fixed this symptom: http://www.postgresql.org/message-id/flat/D5359E0908278642BB1747131D62694DAB22560F@AUSMXMBX01.mrws.biz I'm attaching a test script that I used to observe TLI assignment and to test for that problem. pg_upgrade has been restoring TLI without history files since 9.0.0 or earlier, and that was always risky. The reported symptom became possible with the introduction of the TIMELINE_HISTORY walsender command in 9.3.0. (It was hard to encounter before 9.4, because 9.3 to 9.3 pg_upgrade runs are rare outside of hacker testing.) Since you observed barman breakage less than a week after a release that changed the post-pg_upgrade TLI, it seems prudent to figure that other folks will be affected. At the same time, I don't understand why that release would prompt the first report. Any upgrade from {9.0,9.1,9.2} to {9.3,9.4} already had the behavior you experienced. Ideas? > I don't expect to be able to recover through a pg_upgrade operation, > but pg_upgrade shouldn't make things more complicated than they should > be for backup tools. (If there's a problem with the history files, > shouldn't pg_upgrade copy them instead?) > > In fact, I'm wondering if pg_upgrade shouldn't rather *increase* the > timeline to make sure the archive_command doesn't clobber any files > from the old cluster when reused in the new cluster? It's worth considering that, as a major-release change. Do note this in the documentation, though: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). -- http://www.postgresql.org/docs/devel/static/continuous-archiving.html
Attachment
pgsql-hackers by date: