Re: Postgres, fsync, and OSs (specifically linux) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Postgres, fsync, and OSs (specifically linux) |
Date | |
Msg-id | CAEepm=0uAGf6FvmX7YqHc7hqqSHRCWK17BwrXZgE+YYOcyR4Gw@mail.gmail.com Whole thread Raw |
In response to | Re: Postgres, fsync, and OSs (specifically linux) (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: Postgres, fsync, and OSs (specifically linux)
|
List | pgsql-hackers |
On Thu, Jun 14, 2018 at 5:30 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, May 23, 2018 at 8:02 AM, Andres Freund <andres@anarazel.de> wrote: >> [patches] > > A more interesting question is: how will you cap the number file > handles you send through that pipe? On that OS you call > DuplicateHandle() to fling handles into another process's handle table > directly. Then you send the handle number as plain old data to the > other process via carrier pigeon, smoke signals, a pipe etc. That's > interesting because the handle allocation is asynchronous from the > point of view of the receiver. Unlike the Unix case where the > receiver can count handles and make sure there is space for one more > before it reads a potentially-SCM-containing message, here the > *senders* will somehow need to make sure they don't create too many in > the receiving process. I guess that would involve a shared counter, > and a strategy for what to do when the number is too high (probably > just wait). > > Hmm. I wonder if that would be a safer approach on all operating systems. As a way of poking this thread, here are some more thoughts. Buffer stealing currently look something like this: Evicting backend: lseek(fd) write(fd) ...enqueue-fsync-request via shm... Checkpointer: ...push into hash table... With the patch it presumably looks something like this: Evicting backend: lseek(fd) write(fd) sendmsg(fsync_socket) /* passes fd */ Checkpointer: recvmsg(fsync_socket) /* gets a copy of fd */ ...push into hash table... close(fd) /* for all but the first one received for the same file */ That takes us from 2 syscalls to 5 per evicted buffer. I suppose it's possible that on some operating systems that might hurt a bit, given that it's happening at the granularity of 1GB data files that could have a lot of backends working in them concurrently. I have no idea if it's really a problem on any particular OS. Admittedly on Linux it's probably just a bunch of fast atomic ops and RCU stuff... probably only the existing write() actually takes the inode lock or anything that heavy, and that's probably lost in the noise in an evict-heavy workload. I don't know, I guess it's probably not a problem, but I thought I'd mention that. Contention on the new fsync socket doesn't seem to be a new problem per se since it replaces a contention point we already had: CheckpointerCommLock. If that was acceptable today then perhaps that indicates that any in-kernel contention created by the new syscalls is also OK. My feeling so far is that I'd probably go for sender-collapses model (and it might even be necessary on Windows?) if doing this as a new feature, but I fully understand your desire to do it in a much simpler way that could be back-patched more easily. I'm just slightly concerned about the unintended consequence risk that comes with exercising an operating system feature that not all operating system authors probably intended to be used at high frequency. Nothing that can't be assuaged by testing. * the queue is full and contains no duplicate entries. In that case, we * let the backend know by returning false. */ -bool -ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno) +void +ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno, + File file) Comment out of date. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: