Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: silent data loss with ext4 / all current versions |
Date | |
Msg-id | CAB7nPqQW_xPw==hk+Ei_PxNsKX9PX1vjoxjr8auK9EwPn4FgtA@mail.gmail.com Whole thread Raw |
In response to | Re: silent data loss with ext4 / all current versions (Andres Freund <andres@anarazel.de>) |
Responses |
Re: silent data loss with ext4 / all current versions
Re: silent data loss with ext4 / all current versions |
List | pgsql-hackers |
On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, Thanks for the review. >> +/* >> + * rename_safe -- rename of a file, making it on-disk persistent >> + * >> + * This routine ensures that a rename file persists in case of a crash by using >> + * fsync on the old and new files before and after performing the rename so as >> + * this categorizes as an all-or-nothing operation. >> + */ >> +int >> +rename_safe(const char *oldfile, const char *newfile) >> +{ >> + struct stat filestats; >> + >> + /* >> + * First fsync the old entry and new entry, it this one exists, to ensure >> + * that they are properly persistent on disk. Calling this routine with >> + * an existing new target file is fine, rename() will first remove it >> + * before performing its operation. >> + */ >> + fsync_fname(oldfile, false); >> + if (stat(newfile, &filestats) == 0) >> + fsync_fname(newfile, false); > > I don't think we want any stat()s here. I'd much, much rather check open > for ENOENT. OK. So you mean more or less that, right? int fd; fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0); if (fd < 0) { if (errno != ENOENT) return -1; } else { pg_fsync(fd); CloseTransientFile(fd); } >> +/* >> + * link_safe -- make a file hard link, making it on-disk persistent >> + * >> + * This routine ensures that a hard link created on a file persists on system >> + * in case of a crash by using fsync where on the link generated as well as on >> + * its parent directory. >> + */ >> +int >> +link_safe(const char *oldfile, const char *newfile) >> +{ > > If we go for a new abstraction here, I'd rather make it > 'replace_file_safe' or something, and move the link/rename code #ifdef > into it. Hm. OK. I don't see any reason why switching to link() even in code paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would be a problem thinking about it. Should HAVE_WORKING_LINK be available on a platform we can combine it with unlink. Is that in line with what you think? >> + if (link(oldfile, newfile) < 0) >> + return -1; >> + >> + /* >> + * Make the link persistent in case of an OS crash, the new entry >> + * generated as well as its parent directory need to be flushed. >> + */ >> + fsync_fname(newfile, false); >> + >> + /* >> + * Same for parent directory. This routine is never called to rename >> + * files across directories, but if this proves to become the case, >> + * flushing the parent directory if the old file would be necessary. >> + */ >> + fsync_parent_path(newfile); >> + return 0; > > I think it's a seriously bad idea to encode that knowledge in such a > general sounding routine. We could however argue that this is about > safely replacing the *target* file; not about safely removing the old > file. Not sure I am following here. Are you referring to the fact that if the new file and old file are on different directories would make this routine unreliable? Because yes that's the case if we want to make both of them persistent, and I think we want to do so. Do you suggest to correct this comment to remove the mention to the old file's parent directory because we just care about having the new file as being persistent? Or do you suggest that we should actually extend this routine so as we fsync both the new and old file's parent directory if they differ? > Currently I'm inclined to apply this to master soon. But I think we > might want to wait a while with backpatching. The recent fsync upgrade > disaster kinda makes me a bit careful... There have not been actual field complaints about that yet. That's fine for me to wait a bit. -- Michael
pgsql-hackers by date: