Re: POSIX shared memory redux - Mailing list pgsql-hackers
From | A.M. |
---|---|
Subject | Re: POSIX shared memory redux |
Date | |
Msg-id | BC618525-BB86-41BE-B8B4-D22419C99C45@themactionfaction.com Whole thread Raw |
In response to | Re: POSIX shared memory redux (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POSIX shared memory redux
Re: POSIX shared memory redux |
List | pgsql-hackers |
On Apr 13, 2011, at 9:30 PM, Robert Haas wrote: > On Wed, Apr 13, 2011 at 6:11 PM, A.M. <agentm@themactionfaction.com> wrote: >>> I don't see why we need to get rid of SysV shared memory; needing less >>> of it seems just as good. >> >> 1. As long one keeps SysV shared memory around, the postgresql project has to maintain the annoying platform-specificdocument on how to configure the poorly named kernel parameters. If the SysV region is very small, thatmeans I can run more postgresql instances within the same kernel limits, but one can still hit the limits. My patch allowsthe postgresql project to delete that page and the hassles with it. >> >> 2. My patch proves that SysV is wholly unnecessary. Are you attached to it? (Pun intended.) > > With all due respect, I think this is an unproductive conversation. > Your patch proves that SysV is wholly unnecessary only if we also > agree that fcntl() locking is just as reliable as the nattch > interlock, and Tom and I are trying to explain why we don't believe > that's the case. Saying that we're just wrong without responding to > our points substantively doesn't move the conversation forward. Sorry- it wasn't meant to be an attack- just a dumb pun. I am trying to argue that, even if the fcntl is unreliable, thestartup procedure is just as reliable as it is now. The reasons being: 1) the SysV nattch method's primary purpose is to protect the shmem region. This is no longer necessary in my patch becausethe shared memory in unlinked immediately after creation, so only the initial postmaster and its children have access. 2) the standard postgresql lock file remains the same Furthermore, there is indeed a case where the SysV nattch cannot work while the fcntl locking can indeed catch: if two separatemachines have a postgresql data directory mounted over NFS, postgresql will currently allow both machines to starta postmaster in that directory because the SysV nattch check fails and then the pid in the lock file is the pid on thefirst machine, so postgresql will say "starting anyway". With fcntl locking, this can be fixed. SysV only has presenceon one kernel. > > In case it's not clear, here again is what we're concerned about: A > System V shm *cannot* be removed until nobody is attached to it. A > lock file can be removed, or the lock can be accidentally released by > the apparently innocuous operation of closing a file descriptor. > >> Both you and Tom have somehow assumed that the patch alters current postgresql behavior. In fact, the opposite is true.I haven't changed any of the existing behavior. The "robust" behavior remains. I merely added fcntl interlocking ontop of the lock file to replace the SysV shmem check. > > This seems contradictory. If you replaced the SysV shmem check, then > it's not there, which means you altered the behavior. From what I understood, the primary purpose of the SysV check was to protect the shared memory from multiple stompers. Theinterlock was a neat side-effect. The lock file contents are currently important to get the pid of a potential, conflicting postmaster. With the fcntl API,we can return a live conflicting PID (whether a postmaster or a stuck child), so that's an improvement. This could beused, for example, for STONITH, to reliably kill a dying replication clone- just loop on the pids returned from the lock. Even if the fcntl check passes, the pid in the lock file is checked, so the lock file behavior remains the same. If you were to implement a daemon with a shared data directory but no shared memory, how would implement the interlock? Wouldyou still insist on SysV shmem? Unix daemons generally rely on lock files alone. Perhaps there is a different API onwhich we can agree. Cheers, M
pgsql-hackers by date: