Re: Postgres, fsync, and OSs (specifically linux) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Postgres, fsync, and OSs (specifically linux) |
Date | |
Msg-id | 20180430160945.5s5qfoqryhtmugxl@alap3.anarazel.de Whole thread Raw |
In response to | Re: Postgres, fsync, and OSs (specifically linux) (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: Postgres, fsync, and OSs (specifically linux)
|
List | pgsql-hackers |
On 2018-04-30 13:03:24 +0800, Craig Ringer wrote: > Hrm, something else that just came up. On 9.6+ we use sync_file_range. > It's surely going to eat errors: > > rc = sync_file_range(fd, offset, nbytes, > SYNC_FILE_RANGE_WRITE); > > /* don't error out, this is just a performance optimization */ > if (rc != 0) > { > ereport(WARNING, > (errcode_for_file_access(), > errmsg("could not flush dirty data: %m"))); > } It's not. Only SYNC_FILE_RANGE_WAIT_{BEFORE,AFTER} eat errors. Which seems sensible, because they could be considered data integrity operations. fs/sync.c: SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, unsigned int, flags) { ... if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) goto out_put; } if (flags & SYNC_FILE_RANGE_WRITE) { ret = __filemap_fdatawrite_range(mapping, offset, endbyte, WB_SYNC_NONE); if (ret < 0) goto out_put; } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) ret = file_fdatawait_range(f.file, offset, endbyte); where int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) { struct address_space *mapping = file->f_mapping; __filemap_fdatawait_range(mapping, start_byte, end_byte); return file_check_and_advance_wb_err(file); } EXPORT_SYMBOL(file_fdatawait_range); the critical call is file_check_and_advance_wb_err(). That's the call that checks and clears errors. Would be good to add a kernel xfstest (gets used on all relevant FS these days), to make sure that doesn't change. > I'm very suspicious about the safety of the msync() path too. That seems justified however: SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) { ... error = vfs_fsync_range(file, fstart, fend, 1); .. */ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync) { ... return file->f_op->fsync(file, start, end, datasync); } EXPORT_SYMBOL(vfs_fsync_range); int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { ... ret = file_write_and_wait_range(file, start, end); if (ret) return ret; ... STATIC int xfs_file_fsync( struct file *file, loff_t start, loff_t end, int datasync) { ... error = file_write_and_wait_range(file, start, end); if (error) return error; int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) { ... err2 = file_check_and_advance_wb_err(file); if (!err) err = err2; return err; } Greetings, Andres Freund
pgsql-hackers by date: