Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" - Mailing list pgsql-hackers
From | Anton A. Melnikov |
---|---|
Subject | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
Date | |
Msg-id | 20d45197-80be-4984-33a9-a8d13708992a@inbox.ru Whole thread Raw |
In response to | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
|
List | pgsql-hackers |
Hello! On 24.11.2022 04:02, Thomas Munro wrote: > On Thu, Nov 24, 2022 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: > > ERROR: calculated CRC checksum does not match value stored in file > > The attached draft patch fixes it. Tried to catch this error on my PC, but failed to do it within a reasonable time. The 1s interval is probably too long for me. It seems there are more durable way to reproduce this bug with 0001 patch applied: At the first backend: do $$ begin loop perform pg_update_control_file(); end loop; end; $$; At the second one: do $$ begin loop perform pg_control_system(); end loop; end; $$; It will fails almost immediately with: "ERROR: calculated CRC checksum does not match value stored in file" both with fsync = off and fsync = on. Checked it out for master and REL_11_STABLE. Also checked for a few hours that the patch 0002 fixes this error, but there are some questions to its logical structure. The equality between the previous and newly calculated crc is checked only if the last crc calculation was wrong, i.e not equal to the value stored in the file. It is very unlikely that in this case the previous and new crc can match, so, in fact, the loop will spin until crc is calculated correctly. In the other words, this algorithm is practically equivalent to an infinite loop of reading from a file and calculating crc while(EQ_CRC32C(crc, ControlFile->crc) != true). But then it can be simplified significantly by removing checksums equality checks, bool fist_try and by limiting the maximum number of iterations with some constant in the e.g. for loop to avoid theoretically possible freeze. Or maybe use the variant suggested by Tom Lane, i.e, as far as i understand, repeat the file_open-read-close-calculate_crc sequence twice without a pause between them and check the both calculated crcs for the equality. If they match, exit and return the bool result of comparing between the last calculation with the value from the file, if not, take a pause and repeat everything from the beginning. In this case, no need to check *crc_ok_p inside get_controlfile() as it was in the present version; i think it's more logically correct since this variable is intended top-level functions and the logic inside get_controlfile() should not depend on its value. Also found a warning in 0001 patch for master branch. On my PC gcc gives: xlog.c:2507:1: warning: no previous prototype for ‘pg_update_control_file’ [-Wmissing-prototypes] 2507 | pg_update_control_file() Fixed it with #include "utils/fmgrprotos.h" to xlog.c and add PG_FUNCTION_ARGS to pg_update_control_file(). With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: