Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
Date | |
Msg-id | CA+hUKG+jig+QdBETj_ab++VWSoJjbwT3sLNCnk0wFsY_6tRqoA@mail.gmail.com Whole thread Raw |
In response to | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" ("Anton A. Melnikov" <aamelnikov@inbox.ru>) |
Responses |
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
List | pgsql-hackers |
On Fri, Feb 24, 2023 at 11:12 PM Anton A. Melnikov <aamelnikov@inbox.ru> wrote: > On 17.02.2023 06:21, Thomas Munro wrote: > > BTW there are at least two other places where PostgreSQL already knows > > that concurrent reads and writes are possibly non-atomic (and we also > > don't even try to get the alignment right, making the question moot): > > pg_basebackup, which enables full_page_writes implicitly if you didn't > > have the GUC on already, and pg_rewind, which refuses to run if you > > haven't enabled full_page_writes explicitly (as recently discussed on > > another thread recently; that's an annoying difference, and also an > > annoying behaviour if you know your storage doesn't really need it!) > > It seems a good topic for a separate thread patch. Would you provide a > link to the thread you mentioned please? https://www.postgresql.org/message-id/flat/367d01a7-90bb-9b70-4cda-248e81cc475c%40cosium.com > > Therefore, we need a solution for Windows too. I tried to write the > > equivalent code, in the attached. I learned a few things: Windows > > locks are mandatory (that is, if you lock a file, reads/writes can > > fail, unlike Unix). Combined with async release, that means we > > probably also need to lock the file in xlog.c, when reading it in > > xlog.c:ReadControlFile() (see comments). Before I added that, on a CI > > run, I found that the read in there would sometimes fail, and adding > > the locking fixed that. I am a bit confused, though, because I > > expected that to be necessary only if someone managed to crash while > > holding the write lock, which the CI tests shouldn't do. Do you have > > any ideas? > > Unfortunately, no ideas so far. Was it a pg_control CRC or I/O errors? > Maybe logs of such a fail were saved somewhere? I would like to see > them if possible. I think it was this one: https://cirrus-ci.com/task/5004082033721344 For example, see subscription/011_generated which failed like this: 2023-02-16 06:57:25.724 GMT [5736][not initialized] PANIC: could not read file "global/pg_control": Permission denied That was fixed after I changed it to also do locking in xlog.c ReadControlFile(), in the version you tested. There must be something I don't understand going on, because that cluster wasn't even running before: it had just been created by initdb. But, anyway, I have a new idea that makes that whole problem go away (though I'd still like to understand what happened there): With the "pseudo-advisory lock for Windows" idea from the last email, we don't need to bother with file system level locking in many places. Just in controldata_utils.c, for FE/BE interlocking (that is, we don't need to use that for interlocking of concurrent reads and writes that are entirely in the backend, because we already have an LWLock that we could use more consistently). Changes: 1. xlog.c mostly uses no locking 2. basebackup.c now acquires ControlFileLock 3. only controldata_utils.c uses the new file locking, for FE-vs-BE interlocking 4. lock past the end (pseudo-advisory locking for Windows) Note that when we recover from a basebackup or pg_backup_start() backup, we use the backup label to find a redo start location in the WAL (see read_backup_label()), BUT we still read the copied pg_control file (one that might be too "new", so we don't use its redo pointer). So it had better not be torn, or the recovery will fail. So, in this version I protected that sendFile() with ControlFileLock. But... Isn't that a bit strange? To go down this path we would also need to document the need to copy the control file with the file locked to avoid a rare failure, in the pg_backup_start() documentation. That's annoying (I don't even know how to do it with easy shell commands; maybe we should provide a program that locks and cats the file...?). Could we make better use of the safe copy that we have in the log? Then the pg_backup_start() subproblem would disappear. Conceptually, that'd be just like the way we use FPI for data pages copied during a backup. I'm not sure about any of that, though, it's just an idea, not tested. > > Or maybe the problem is/was too theoretical before... > > As far as i understand, this problem has always been, but the probability of > this is extremely small in practice, which is directly pointed in > the docs [4]: > "So while it is theoretically a weak spot, pg_control does not seem > to be a problem in practice." I guess that was talking about power loss atomicity again? Knowledge of the read/write atomicity problem seems to be less evenly distributed (and I think it became more likely in Linux > 3.something; and the Windows situation possibly hadn't been examined by anyone before). > > Here's a patch like that. > > In this patch, the problem is solved for the live database and > maybe remains for some possible cases of an external backup. In a whole, > i think it can be stated that this is a sensible step forward. > > Just like last time, i ran a long stress test under windows with current patch. > There were no errors for more than 3 days even with fsync=off. Thanks!
Attachment
pgsql-hackers by date: