Thomas Munro <thomas.munro@gmail.com> writes:
> I now suspect this specific readdir() problem is in FreeBSD's NFS
> client. See below. There have also been reports of missed files from
> (IIRC) Linux clients without much analysis, but that doesn't seem too
> actionable from here unless someone can come up with a repro or at
> least some solid details to investigate; those involved unspecified
> (possibly appliance/cloud) NFS and CIFS file servers.
I forgot to report back, but yesterday I spent time unsuccessfully
trying to reproduce the problem with macOS client and NFS server
using btrfs (a Synology NAS running some no-name version of Linux).
So that lends additional weight to your conclusion that it isn't
specifically a btrfs bug.
> I see this issue here with a FreeBSD client talking to a Debian server
> exporting BTRFS or XFS, even with dirreadsize set high so that
> multi-request paging is not expected. Looking at Wireshark and the
> NFS spec (disclaimer: I have never studied NFS at this level before,
> addito salis grano), what I see is a READDIR request with cookie=0
> (good), and which receives a response containing the whole directory
> listing and a final entry marker eof=1 (good), but then FreeBSD
> unexpectedly (to me) sends *another* READDIR request with cookie=662,
> which is a real cookie that was received somewhere in the middle of
> the first response on the entry for "13816_fsm", and that entry was
> followed by an entry for "13816_vm". The second request gets a
> response that begins at "13816_vm" (correct on the server's part).
> Then the client sends REMOVE (unlink) requests for some but not all of
> the files, including "13816_fsm" but not "13816_vm". Then it sends
> yet another READDIR request with cookie=0 (meaning go from the top),
> and gets a non-empty directory listing, but immediately sends RMDIR,
> which unsurprisingly fails NFS3ERR_NOTEMPTY. So my best guess so far
> is that FreeBSD's NFS client must be corrupting its directory cache
> when files are unlinked, and it's not the server's fault. I don't see
> any obvious problem with the way the cookies work. Seems like
> material for a minimised bug report elsewhere, and not our issue.
Yeah, that seems pretty damning. Thanks for looking into it.
regards, tom lane