Thread: pg_upgrade broken by xlog numbering
On HEAD at the moment, `make check-world` is failing on a 32-bit Linux build: + pg_upgrade -d /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin -B /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin Performing Consistency Checks ----------------------------- Checking current, bin, and data directories ok Checking cluster versions ok Some required control information is missing; cannot find: first log file ID after reset first log file segment after reset Cannot continue without required control information, terminating Failure, exiting
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > On HEAD at the moment, `make check-world` is failing on a 32-bit Linux > build: > > + pg_upgrade -d > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin > -B > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin > Performing Consistency Checks > ----------------------------- > Checking current, bin, and data directories ok > Checking cluster versions ok > Some required control information is missing; cannot find: > first log file ID after reset > first log file segment after reset > > Cannot continue without required control information, terminating > Failure, exiting On MacOS X, on latest sources, initdb fails: creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... FATAL: control file contains invalid data child process exited with exit code 1 initdb: data directory "/Users/rhaas/pgsql/src/test/regress/./tmp_check/data" not removed at user's request -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 25 June 2012 13:11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > On HEAD at the moment, `make check-world` is failing on a 32-bit Linux > build: > > + pg_upgrade -d > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin > -B > /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin > Performing Consistency Checks > ----------------------------- > Checking current, bin, and data directories ok > Checking cluster versions ok > Some required control information is missing; cannot find: > first log file ID after reset > first log file segment after reset > > Cannot continue without required control information, terminating > Failure, exiting I get precisely the same on 64-bit Linux. -- Thom
Robert Haas <robertmhaas@gmail.com> writes: > On MacOS X, on latest sources, initdb fails: > creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 100 > selecting default shared_buffers ... 32MB > creating configuration files ... ok > creating template1 database in > /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok > initializing pg_authid ... ok > initializing dependencies ... ok > creating system views ... ok > loading system objects' descriptions ... ok > creating collations ... ok > creating conversions ... ok > creating dictionaries ... FATAL: control file contains invalid data > child process exited with exit code 1 Same for me. It's crashing here: if (ControlFile->state < DB_SHUTDOWNED || ControlFile->state > DB_IN_PRODUCTION || !XRecOffIsValid(ControlFile->checkPoint)) ereport(FATAL, (errmsg("control file contains invalid data"))); state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test. ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that? (I suppose the reason this is only failing on some machines is platform-specific variations in xlog entry size, but it's still a bit distressing that this got committed in such a broken state.) regards, tom lane
On Mon, Jun 25, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On MacOS X, on latest sources, initdb fails: > >> creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok >> creating subdirectories ... ok >> selecting default max_connections ... 100 >> selecting default shared_buffers ... 32MB >> creating configuration files ... ok >> creating template1 database in >> /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok >> initializing pg_authid ... ok >> initializing dependencies ... ok >> creating system views ... ok >> loading system objects' descriptions ... ok >> creating collations ... ok >> creating conversions ... ok >> creating dictionaries ... FATAL: control file contains invalid data >> child process exited with exit code 1 > > Same for me. It's crashing here: > > if (ControlFile->state < DB_SHUTDOWNED || > ControlFile->state > DB_IN_PRODUCTION || > !XRecOffIsValid(ControlFile->checkPoint)) > ereport(FATAL, > (errmsg("control file contains invalid data"))); > > state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test. > ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that? > > (I suppose the reason this is only failing on some machines is > platform-specific variations in xlog entry size, but it's still a bit > distressing that this got committed in such a broken state.) I'm guessing that the problem is as follows: in the old code, the XLogRecord header could not be split, so any offset that was closer to the end of the page than SizeOfXLogRecord was a sure sign of trouble. But commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d relaxed that restriction, so now it IS legal for the checkpoint record to be where it is. But it seems that XRecOffIsValid() didn't get the memo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > On HEAD at the moment, `make check-world` is failing on a 32-bit Linux > build: This appears to be because of the following hunk from commit dfda6ebaec6763090fb78b458a979b558c50b39b: @@ -558,10 +536,10 @@ PrintControlValues(bool guessed) snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT, ControlFile.system_identifier); - printf(_("First log file ID after reset: %u\n"), - newXlogId); - printf(_("First log file segment after reset: %u\n"), - newXlogSeg); + XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSe + + printf(_("First log segment after reset: %s\n"), + fname); printf(_("pg_control version number: %u\n"), ControlFile.pg_control_version); printf(_("Catalog version number: %u\n"), Evidently, Heikki failed to realize that pg_upgrade gets the control data information by parsing the output of pg_controldata. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company