On Tue, Feb 19, 2019 at 2:03 AM Thomas Munro <thomas.munro@gmail.com> wrote: > How can we achieve that, without writing our > own NFS client?
<dons crash helmet>
You'll need it :)
Instead of writing our own NFS client, how about writing our own network storage protocol? Imagine a stripped-down postmaster process running on the NFS server that essentially acts as a block server. Through some sort of network chatter, it can exchange blocks with the real postmaster running someplace else. The mini-protocol would contain commands like READ_BLOCK, WRITE_BLOCK, EXTEND_RELATION, FSYNC_SEGMENT, etc. - basically whatever the relevant operations at the smgr layer are. And the user would see the remote server as a tablespace mapped to a special smgr.
As compared with your proposal, this has both advantages and disadvantages. The main advantage is that we aren't dependent on being able to make NFS behave in any particular way; indeed, this type of solution could be used not only to work around problems with NFS, but also problems with any other network filesystem. We get to reuse all of the work we've done to try to make local operation reliable; the remote server can run the same code that would be run locally whenever the master tells it to do so. And you can even imagine trying to push more work to the remote side in some future version of the protocol. The main disadvantage is that it doesn't help unless you can actually run software on the remote box. If the only access you have to the remote side is that it exposes an NFS interface, then this sort of thing is useless. And that's probably a pretty common scenario.
In my experience, that covers approximately 100% of the usecase.
The only case I've run into people wanting to use postgres on NFS, the NFS server is a big filer from netapp or hitachi or whomever. And you're not going to be able to run something like that on top of it.
There might be a use-case for the split that you mention, absolutely, but it's not going to solve the people-who-want-NFS situation. You'd solve more of that by having the middle layer speak "raw device" underneath and be able to sit on top of things like iSCSI (yes, really).