Re: Postgres, fsync, and OSs (specifically linux) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Postgres, fsync, and OSs (specifically linux) |
Date | |
Msg-id | CAEepm=3VioiGiNaUNCPZoZB63GAKkdVN-LyHE0Os1Hh+mu5Psw@mail.gmail.com Whole thread Raw |
In response to | Re: Postgres, fsync, and OSs (specifically linux) (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Postgres, fsync, and OSs (specifically linux)
|
List | pgsql-hackers |
On Sun, Apr 29, 2018 at 10:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 28 April 2018 at 09:15, Andres Freund <andres@anarazel.de> wrote: >> On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: >>> The people I've spoken to so far have encouraged us to continue >>> working with the filesystem layer, offering encouragement of our >>> decision to use filesystems. >> >> There's a lot of people disagreeing with it too. > > Specific recent verbal feedback from OpenLDAP was that the project > adopted DIO and found no benefit in doing so, with regret the other > way from having tried. I'm not sure if OpenLDAP is really comparable. The big three RDBMSs + MySQL started like us and eventually switched to direct IO, I guess at a time when direct IO support matured in OSs and their own IO scheduling was thought to be superior. I'm pretty sure they did that because they didn't like wasting RAM on double buffering and had better ideas about IO scheduling. From some googling this morning: DB2: The Linux/Unix/Windows edition changed its default to DIO ("NO FILESYSTEM CACHING") in release 9.5 in 2007[1], but it can still do buffered IO if you ask for it. Oracle: Around the same time or earlier, in the Linux 2.4 era, Oracle apparently supported direct IO ("FILESYSTEMIO_OPTIONS = DIRECTIO" (or SETALL for DIRECTIO + ASYNCH)) on big iron Unix but didn't yet use it on Linux[2]. There were some amusing emails from Linus Torvalds on this topic[3]. I'm not sure what FILESYSTEMIO_OPTIONS's default value is on each operating system today or when it changed, it's probably SETALL everywhere by now? I wonder if they stuck with buffered IO for a time on Linux despite the availability of direct IO because they thought it was more reliable or more performant. SQL Server: I couldn't find any evidence that they've even kept the option to use buffered IO (which must have existed in the ancestral code base). Can it? It's a different situation though, targeting a reduced set of platforms. MySQL: The default is still buffered ("innodb_flush_method = fsync" as opposed to "O_DIRECT") but O_DIRECT is supported and widely recommended, so it sounds like it's usually a win. Maybe not on smaller systems though? On MySQL, there are anecdotal reports of performance suffering on some systems when you turn on O_DIRECT however. If that's true, it's interesting to speculate about why that might be as it would probably apply also to us in early versions (optimistic explanation: the kernel's stretchy page cache allows people to get away with poorly tuned buffer pool size? pessimistic explanation: the page reclamation or IO scheduling (asynchronous write-back, write clustering, read-ahead etc) is not as good as the OS's, but that effect is hidden by suitably powerful disk subsystem with its own magic caching?) Note that its O_DIRECT setting *also* calls fsync() to flush filesystem meta-data (necessary if the file was extended); I wonder if that is exposed to write-back error loss. [1] https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0051304.html [2] http://www.ixora.com.au/notes/direct_io.htm [3] https://lkml.org/lkml/2002/5/11/58 -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: