Re: [HACKERS] New wal_sync_method for Darwin? - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] New wal_sync_method for Darwin? |
Date | |
Msg-id | 200505172227.j4HMRRn27458@candle.pha.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] New wal_sync_method for Darwin?
|
List | pgsql-patches |
Attached is an updated version of your Darwin Fsync-writethrough patch. I did much more restructuring because we do fsync in more places than just WAL (like at checkpoints), and we should follow the WAL write-through in those places too if that method is used for fsync of WAL, rather than the normal fsync without the force-writethrough. I restructured the code so now pg_fsync does writethrough or not based on the wal setting. My feeling is you need WAL writethrough, you need it for other fsyncs as well. Write-through was added in an 8.0.X subrelease, and because it was a subrelease, the patch did a minimal amount of code change. This change is the restructuring that makes sense for 8.1. --------------------------------------------------------------------------- Chris Campbell wrote: > Thanks, Bruce! Here's the patch (would you rather patches be attached > files, or inline?). > > Some explanation. In the current code, there is no "fsync" on Windows, > only "fsync_writethrough". But, internally, it behaves identically to > "fsync". The only special case was for the GUC strings. > > The patch changes it thusly: since "fsync" and "fsync_writethrough" > cannot be identical on Darwin, I added a new constant for > SYNC_METHOD_FSYNC_WRITE_THROUGH (3). A GUC value of > "fsync_writethrough" sets fsync_method to this new constant. > > On Windows, the fsync_writethrough case falls through to the fsync case > (which simply runs fsync). On Darwin, the fsync_writethrough case > executes fcntl(openLogFile, F_FULLFSYNC, 0), which does everything that > fsync does, and then flushes the disk's write cache. The code does not > need to call fsync directly, just fcntl. > > The performance of a script that executes 11,000 INSERT statements > (outside a transaction) is pretty slow, since it requires flushing the > disk write cache after EVERY query (transaction commit). I waited about > 15 minutes before I just killed it. But if you do all the inserts > inside a transaction, then I only pay the cache flush price once at > transaction commit, and it's quite usable. > > I asked around on IRC, and the only place where flushing the disk cache > is important is when syncing a WAL file. So I didn't change the > behavior of fsync() globally for the code...just for WAL syncs. > > I added a new #define called HAVE_FSYNC_WRITE_THROUGH, which is defined > only by the win32.h and darwin.h port headers. This controls the > availability of the "fsync_writethrough" GUC setting. Thanks to the > port headers, no configure test is needed. > > Thanks! Let me know what you think of the patch, and tell me what else > I can do. > > - Chris > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.319 diff -c -c -r1.319 runtime.sgml *** doc/src/sgml/runtime.sgml 15 May 2005 00:26:18 -0000 1.319 --- doc/src/sgml/runtime.sgml 17 May 2005 22:03:49 -0000 *************** *** 1595,1601 **** values are <literal>fsync</> (call <function>fsync()</> at each commit), <literal>fdatasync</> (call <function>fdatasync()</> at each commit), ! <literal>fsync_writethrough</> (call <function>_commit()</> at each commit on Windows), <literal>open_sync</> (write WAL files with <function>open()</> option <symbol>O_SYNC</>), and <literal>open_datasync</> (write WAL files with <function>open()</> option <symbol>O_DSYNC</>). Not all of these choices are available on all platforms. --- 1595,1601 ---- values are <literal>fsync</> (call <function>fsync()</> at each commit), <literal>fdatasync</> (call <function>fdatasync()</> at each commit), ! <literal>fsync_writethrough</> (force write-through of any disk write cache), <literal>open_sync</> (write WAL files with <function>open()</> option <symbol>O_SYNC</>), and <literal>open_datasync</> (write WAL files with <function>open()</> option <symbol>O_DSYNC</>). Not all of these choices are available on all platforms. Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.191 diff -c -c -r1.191 xlog.c *** src/backend/access/transam/xlog.c 10 May 2005 22:27:29 -0000 1.191 --- src/backend/access/transam/xlog.c 17 May 2005 22:03:53 -0000 *************** *** 51,61 **** * default method. We assume that fsync() is always available, and that * configure determined whether fdatasync() is. */ - #define SYNC_METHOD_FSYNC 0 - #define SYNC_METHOD_FDATASYNC 1 - #define SYNC_METHOD_OPEN 2 /* used for both O_SYNC and - * O_DSYNC */ - #if defined(O_SYNC) #define OPEN_SYNC_FLAG O_SYNC #else --- 51,56 ---- *************** *** 79,99 **** #define DEFAULT_SYNC_METHOD_STR "open_datasync" #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG ! #else ! #if defined(HAVE_FDATASYNC) #define DEFAULT_SYNC_METHOD_STR "fdatasync" #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC #define DEFAULT_SYNC_FLAGBIT 0 ! #else ! #ifndef FSYNC_IS_WRITE_THROUGH #define DEFAULT_SYNC_METHOD_STR "fsync" #else #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough" ! #endif ! #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC #define DEFAULT_SYNC_FLAGBIT 0 #endif - #endif /* User-settable parameters */ --- 74,92 ---- #define DEFAULT_SYNC_METHOD_STR "open_datasync" #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG ! #elif defined(HAVE_FDATASYNC) #define DEFAULT_SYNC_METHOD_STR "fdatasync" #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC #define DEFAULT_SYNC_FLAGBIT 0 ! #elif !defined(HAVE_FSYNC_WRITETHROUGH_ONLY) #define DEFAULT_SYNC_METHOD_STR "fsync" + #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC + #define DEFAULT_SYNC_FLAGBIT 0 #else #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough" ! #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH #define DEFAULT_SYNC_FLAGBIT 0 #endif /* User-settable parameters */ *************** *** 122,128 **** /* these are derived from XLOG_sync_method by assign_xlog_sync_method */ ! static int sync_method = DEFAULT_SYNC_METHOD; static int open_sync_bit = DEFAULT_SYNC_FLAGBIT; #define XLOG_SYNC_BIT (enableFsync ? open_sync_bit : 0) --- 115,121 ---- /* these are derived from XLOG_sync_method by assign_xlog_sync_method */ ! int sync_method = DEFAULT_SYNC_METHOD; static int open_sync_bit = DEFAULT_SYNC_FLAGBIT; #define XLOG_SYNC_BIT (enableFsync ? open_sync_bit : 0) *************** *** 5249,5264 **** int new_sync_method; int new_sync_bit; - #ifndef FSYNC_IS_WRITE_THROUGH if (pg_strcasecmp(method, "fsync") == 0) - #else - /* Win32 fsync() == _commit(), which writes through a write cache */ - if (pg_strcasecmp(method, "fsync_writethrough") == 0) - #endif { new_sync_method = SYNC_METHOD_FSYNC; new_sync_bit = 0; } #ifdef HAVE_FDATASYNC else if (pg_strcasecmp(method, "fdatasync") == 0) { --- 5242,5259 ---- int new_sync_method; int new_sync_bit; if (pg_strcasecmp(method, "fsync") == 0) { new_sync_method = SYNC_METHOD_FSYNC; new_sync_bit = 0; } + #ifdef HAVE_FSYNC_WRITETHROUGH + else if (pg_strcasecmp(method, "fsync_writethrough") == 0) + { + new_sync_method = SYNC_METHOD_FSYNC_WRITETHROUGH; + new_sync_bit = 0; + } + #endif #ifdef HAVE_FDATASYNC else if (pg_strcasecmp(method, "fdatasync") == 0) { *************** *** 5328,5339 **** switch (sync_method) { case SYNC_METHOD_FSYNC: ! if (pg_fsync(openLogFile) != 0) ereport(PANIC, (errcode_for_file_access(), errmsg("could not fsync log file %u, segment %u: %m", openLogId, openLogSeg))); break; #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: if (pg_fdatasync(openLogFile) != 0) --- 5323,5343 ---- switch (sync_method) { case SYNC_METHOD_FSYNC: ! if (pg_fsync_no_writethrough(openLogFile) != 0) ereport(PANIC, (errcode_for_file_access(), errmsg("could not fsync log file %u, segment %u: %m", openLogId, openLogSeg))); break; + #ifdef HAVE_FSYNC_WRITETHROUGH + case SYNC_METHOD_FSYNC_WRITETHROUGH: + if (pg_fsync_writethrough(openLogFile) != 0) + ereport(PANIC, + (errcode_for_file_access(), + errmsg("could not fsync write-through log file %u, segment %u: %m", + openLogId, openLogSeg))); + break; + #endif #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: if (pg_fdatasync(openLogFile) != 0) Index: src/backend/storage/file/fd.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v retrieving revision 1.115 diff -c -c -r1.115 fd.c *** src/backend/storage/file/fd.c 31 Dec 2004 22:00:51 -0000 1.115 --- src/backend/storage/file/fd.c 17 May 2005 22:03:54 -0000 *************** *** 232,242 **** /* ! * pg_fsync --- same as fsync except does nothing if enableFsync is off */ int pg_fsync(int fd) { if (enableFsync) return fsync(fd); else --- 232,258 ---- /* ! * pg_fsync --- do fsync with or without writethrough */ int pg_fsync(int fd) { + #ifndef HAVE_FSYNC_WRITETHROUGH_ONLY + if (sync_method != SYNC_METHOD_FSYNC_WRITETHROUGH) + return pg_fsync_no_writethrough(fd); + else + #endif + return pg_fsync_writethrough(fd); + } + + + /* + * pg_fsync_no_writethrough --- same as fsync except does nothing if + * enableFsync is off + */ + int + pg_fsync_no_writethrough(int fd) + { if (enableFsync) return fsync(fd); else *************** *** 244,249 **** --- 260,283 ---- } /* + * pg_fsync_writethrough + */ + int + pg_fsync_writethrough(int fd) + { + if (enableFsync) + #ifdef WIN32 + return _commit(fd); + #elif defined(__darwin__) + return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0; + #else + return -1; + #endif + else + return 0; + } + + /* * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off * * Not all platforms have fdatasync; treat as fsync if not available. Index: src/include/access/xlog.h =================================================================== RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v retrieving revision 1.60 diff -c -c -r1.60 xlog.h *** src/include/access/xlog.h 28 Apr 2005 21:47:17 -0000 1.60 --- src/include/access/xlog.h 17 May 2005 22:03:56 -0000 *************** *** 75,80 **** --- 75,87 ---- */ #define XLOG_NO_TRAN XLR_INFO_MASK + /* Sync methods */ + #define SYNC_METHOD_FSYNC 0 + #define SYNC_METHOD_FDATASYNC 1 + #define SYNC_METHOD_OPEN 2 /* for O_SYNC and O_DSYNC */ + #define SYNC_METHOD_FSYNC_WRITETHROUGH 3 + extern int sync_method; + /* * List of these structs is used to pass data to XLogInsert(). * Index: src/include/port/darwin.h =================================================================== RCS file: /cvsroot/pgsql/src/include/port/darwin.h,v retrieving revision 1.6 diff -c -c -r1.6 darwin.h *** src/include/port/darwin.h 23 Dec 2003 03:31:30 -0000 1.6 --- src/include/port/darwin.h 17 May 2005 22:03:57 -0000 *************** *** 1 **** --- 1,4 ---- #define __darwin__ 1 + + #define HAVE_FSYNC_WRITETHROUGH + Index: src/include/port/win32.h =================================================================== RCS file: /cvsroot/pgsql/src/include/port/win32.h,v retrieving revision 1.44 diff -c -c -r1.44 win32.h *** src/include/port/win32.h 24 Mar 2005 04:36:19 -0000 1.44 --- src/include/port/win32.h 17 May 2005 22:03:57 -0000 *************** *** 16,23 **** #define mkdir(a,b) mkdir(a) ! #define fsync(a) _commit(a) ! #define FSYNC_IS_WRITE_THROUGH #define ftruncate(a,b) chsize(a,b) #define USES_WINSOCK --- 16,23 ---- #define mkdir(a,b) mkdir(a) ! #define HAVE_FSYNC_WRITETHROUGH ! #define HAVE_FSYNC_WRITETHROUGH_ONLY #define ftruncate(a,b) chsize(a,b) #define USES_WINSOCK Index: src/include/storage/fd.h =================================================================== RCS file: /cvsroot/pgsql/src/include/storage/fd.h,v retrieving revision 1.50 diff -c -c -r1.50 fd.h *** src/include/storage/fd.h 31 Dec 2004 22:03:42 -0000 1.50 --- src/include/storage/fd.h 17 May 2005 22:03:57 -0000 *************** *** 89,94 **** --- 89,96 ---- SubTransactionId parentSubid); extern void RemovePgTempFiles(void); extern int pg_fsync(int fd); + extern int pg_fsync_no_writethrough(int fd); + extern int pg_fsync_writethrough(int fd); extern int pg_fdatasync(int fd); /* Filename components for OpenTemporaryFile */
pgsql-patches by date: