Re: Postgres, fsync, and OSs (specifically linux) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Postgres, fsync, and OSs (specifically linux) |
Date | |
Msg-id | CANP8+jJETivNC9++X6-pGbNnvx03ppZCLua+djdvOtZnsFsjiw@mail.gmail.com Whole thread Raw |
In response to | Postgres, fsync, and OSs (specifically linux) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Postgres, fsync, and OSs (specifically linux)
Re: Postgres, fsync, and OSs (specifically linux) Re: Postgres, fsync, and OSs (specifically linux) |
List | pgsql-hackers |
On 27 April 2018 at 15:28, Andres Freund <andres@anarazel.de> wrote: > - Add a pre-checkpoint hook that checks for filesystem errors *after* > fsyncing all the files, but *before* logging the checkpoint completion > record. Operating systems, filesystems, etc. all log the error format > differently, but for larger installations it'd not be too hard to > write code that checks their specific configuration. > > While I'm a bit concerned adding user-code before a checkpoint, if > we'd do it as a shell command it seems pretty reasonable. And useful > even without concern for the fsync issue itself. Checking for IO > errors could e.g. also include checking for read errors - it'd not be > unreasonable to not want to complete a checkpoint if there'd been any > media errors. It seems clear that we need to evaluate our compatibility not just with an OS, as we do now, but with an OS/filesystem. Although people have suggested some approaches, I'm more interested in discovering how we can be certain we got it right. And the end result seems to be that PostgreSQL will be forced, in the short term, to declare certain combinations of OS/filesystem unsupported, with clear warning sent out to users. Adding a pre-checkpoint hook encourages people to fix this themselves without reporting issues, so I initially oppose this until we have a clearer argument as to why we need it. The answer is not to make this issue more obscure, but to make it more public. > - Use direct IO. Due to architectural performance issues in PG and the > fact that it'd not be applicable for all installations I don't think > this is a reasonable fix for the issue presented here. Although it's > independently something we should work on. It might be worthwhile to > provide a configuration that allows to force DIO to be enabled for WAL > even if replication is turned on. "Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems". It would be a major admission of defeat for us to take that as our main route to a solution. The people I've spoken to so far have encouraged us to continue working with the filesystem layer, offering encouragement of our decision to use filesystems. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: