Re: pg_upgrade test failure - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: pg_upgrade test failure |
Date | |
Msg-id | CA+hUKG+H-q78LxD20UOVyJbP1_idyocj6va_v1TF4RnN9JZa+w@mail.gmail.com Whole thread Raw |
In response to | Re: pg_upgrade test failure (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: pg_upgrade test failure
|
List | pgsql-hackers |
On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so > > didn't warn about the file itself, but then failed one moment later in > > rmdir. > > Yeah, I think this is my fault. In commit f357233c the new lstat() > call might return ENOENT for STATUS_DELETE_PENDING, and then we don't > enter pgunlink()'s 10 second sleep-retry loop. Let me think about how > best to fix that, and how to write a regression test program that > would exercise stuff like this. Might take a couple of days as I am > away from computers until mid-week. I think something like the attached should do the right thing for STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes back to being blocking (sleep+retry until eventually we reach ENOENT or we time out and give up with EACCES), but we still distinguish it from true ENOENT so we have a fast exit in that case. This is passing CI, but not tested yet. One ugly thing in this patch is that it has to deal with our historical mistake (?) of including Windows headers in this file in Cygwin builds for no reason and thus getting WIN32 defined on a non-WIN32 build, as I've complained about before[1] but not yet tidied up. Remembering why we do any of this weird looking stuff that we don't need on Unix, the general idea is that things that scan directories to unlink everything before unlinking the parent directory need to block for a while on STATUS_DELETE_PENDING to increase their probability of success, while things that do anything else probably just want to skip such zombie files completely. To recap, we have: * readdir() sees files that are ENOENT-in-progress (so recursive unlinks can see them) * unlink() waits for ENOENT-in-progress to reach ENOENT (what broke here) * stat() and lstat() report ENOENT-in-progress as ENOENT (done to fix eg pg_basebackup, which used to fail at random on Windows) * open() reports ENOENT-in-progress as either ENOENT or EEXIST depending on O_CREAT (because used by stat()) Clearly this set of kludges isn't perfect and other kludge-sets would be possible too. One thought is that we could hide ENOENT-in-progress from readdir(), and add a new rmdir() wrapper instead. If it gets a directory-not-empty error from the kernel, it could at that point wait for zombie files to go away (perhaps registering for file system events with some local equivalent of KQ_FILTER_VNODE if there is one, to be less sloppy that the current sleep() nonsense, but sleep would work too). When I'm back at my real keyboard I'll try to come up with tests for this stuff, but I'm not sure how solid we can really make a test for this particular case -- I think you'd need to have another thread open the file and then close it after different periods of time, to demonstrate that the retry loop works but also gives up, and that's exactly the sort of timing-dependent stuff we try to avoid. But I think I'll try that anyway, because it's essential infrastructure to allow Unix-only hackers to work only this stuff. Once we have that, we might be able to make some more progress with the various FILE_DISPOSITION_POSIX_SEMANTICS proposals, if it helps, because we'll have reproducible evidence for what it really does. [1] https://commitfest.postgresql.org/39/3781/
Attachment
pgsql-hackers by date: