Thread: semaphore usage "port based"?
I've got an odd issue that I'm not sure how to fix ... or, if fixing is even possible ... I just put into place a FreeBSD 6.x server ... it has 2 jails running on it, and inside of each, I'm trying to run a PostgreSQL 7.4.12 server (OpenACS requirement, no choice there) ... Now, on my older FreeBSD 4.x servers, I have about 17 PostgreSQL servers (some 7.2, some 7.4, some 8.x) ... and they all run fine, and they all run on port 5432 ... Now, something in FreeBSD has changed since 4.x that, if you start up a second PostgreSQL server on port 5432, the first one starts to generate "semctl: Invalid argument" errors ... If I move one to port 5433, both run great ... Now, since this *did* work fine with 4.x, the FreeBSD developers have obviously changed something that is causing it not to work ... but, since 'changing port' appears to fix it, I'm wondering if there is something in our Semaphore creation code that can be tweaked so that the semaphore side of things *thinks* its running on a different port, but it still responses to port 5432? Or, more simply, I think ... is there somewhere in the Semaphore code that is using the port # as a 'seed'? I'm trying to attack things from the FreeBSD side too, to find out what has changed, and how to fix it, but figured I might be able to come up with a quicker fix from this group ... Thx ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
'k, an excerpt from a thread on the freebsd lists ... I'm not sure how to answer: ---- On Sun, Apr 02, 2006 at 05:24:10PM -0300, Marc G. Fournier wrote: > On Sun, 2 Apr 2006, Kris Kennaway wrote: > > >>Right, but why are they doing it *consistently* in FreeBSD 6.x, when they > >>never did it in FreeBSD 4.x? I have postmaster processes running on the > >>FreeBSD box as far back as November 27th, 2005 ... and have *never* > >>experienced this problem ... so it isn't PostgreSQL that has changed, > >>something in FreeBSD has changed :( > > > >You'll need to do some debugging to find out which of the two causes > >of EINVAL are true here (or some undocumented cause). > > 'k, right now, the checks in PostgreSQL are just seeing if the result of > semctl < 0 ... i see from the man page what 'two values' of EINVAL you are > referring to ... but, if they both return the same ERRNO, how do I > determine which of the two is the cause of the problem? :( Evaluate context: what other semaphore operations have been performed previously? Kris ------ is there any easy way to answer this? I'm getting the Invalid Argument error for SETVAL and IPC_RMID ... On Sun, 2 Apr 2006, Marc G. Fournier wrote: > > I've got an odd issue that I'm not sure how to fix ... or, if fixing is even > possible ... > > I just put into place a FreeBSD 6.x server ... it has 2 jails running on it, > and inside of each, I'm trying to run a PostgreSQL 7.4.12 server (OpenACS > requirement, no choice there) ... > > Now, on my older FreeBSD 4.x servers, I have about 17 PostgreSQL servers > (some 7.2, some 7.4, some 8.x) ... and they all run fine, and they all run on > port 5432 ... > > Now, something in FreeBSD has changed since 4.x that, if you start up a > second PostgreSQL server on port 5432, the first one starts to generate > "semctl: Invalid argument" errors ... > > If I move one to port 5433, both run great ... > > Now, since this *did* work fine with 4.x, the FreeBSD developers have > obviously changed something that is causing it not to work ... but, since > 'changing port' appears to fix it, I'm wondering if there is something in our > Semaphore creation code that can be tweaked so that the semaphore side of > things *thinks* its running on a different port, but it still responses to > port 5432? > > Or, more simply, I think ... is there somewhere in the Semaphore code that is > using the port # as a 'seed'? > > I'm trying to attack things from the FreeBSD side too, to find out what has > changed, and how to fix it, but figured I might be able to come up with a > quicker fix from this group ... > > Thx ... > > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > Or, more simply, I think ... is there somewhere in the Semaphore code that > is using the port # as a 'seed'? We use the port number as a basis for selecting the semaphore key (see semget(2)). There is code in there to pick a different key value if the one we first selected appears to be in use; that has to work correctly if you're going to run multi postmasters on the same port number. It sounds like FBSD 6 has done something that broke the key-in-use check. Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in src/backend/port/sysv_sema.c. It may be worth stepping through them with gdb to see what the semget calls are returning. regards, tom lane
I wrote: > Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in > src/backend/port/sysv_sema.c. It may be worth stepping through them > with gdb to see what the semget calls are returning. BTW, even before doing that, you should look at "ipcs -s" output to try to get a clue what's going on. The EINVAL failures may be because the second postmaster to start deletes the semaphores created by the first one. You could easily see this happening in before-and-after ipcs data if so. strace'ing startup of the second postmaster is another approach that might be easier than gdb'ing. regards, tom lane
On Sun, 2 Apr 2006, Tom Lane wrote: > I wrote: >> Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in >> src/backend/port/sysv_sema.c. It may be worth stepping through them >> with gdb to see what the semget calls are returning. > > BTW, even before doing that, you should look at "ipcs -s" output to try > to get a clue what's going on. The EINVAL failures may be because the > second postmaster to start deletes the semaphores created by the first > one. You could easily see this happening in before-and-after ipcs data > if so. You are right ... Before: Semaphores: T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME s 524288 5432001 --rw------- 70 70 70 70 17 14:44:19 14:44:19 s 524289 5432002 --rw------- 70 70 70 70 17 14:44:19 14:44:19 s 524290 5432003 --rw------- 70 70 70 70 17 14:44:19 14:44:19 s 524291 5432004 --rw------- 70 70 70 70 17 14:44:19 14:44:19 s 524292 5432005 --rw------- 70 70 70 70 17 14:44:19 14:44:19 s 524293 5432006 --rw------- 70 70 70 70 17 20:23:56 14:44:19 s 524294 5432007 --rw------- 70 70 70 70 17 20:23:58 14:44:19 After: Semaphores: T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME s 589824 5432001 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589825 5432002 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589826 5432003 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589827 5432004 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589828 5432005 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589829 5432006 --rw------- 70 70 70 70 17 21:38:03 21:38:03 s 589830 5432007 --rw------- 70 70 70 70 17 21:38:03 21:38:03 So, our semget() is trying to acquire 5432001, FreeBSD's semget is reporting back that its not in use, so the second instance if basically 'punting' the original one off of it ... Kris, from the PostgreSQL sources, here is where we try and set the semId to use ... is there something we are doing wrong with our code as far as FreeBSD 6.x is concerned, such that semget is not returning a negative value when the key is already in use? Or is there a problem with semget() in a jail such that it is allowing for the KEY to be reused, instead of returning a negative value? ======== static IpcSemaphoreId InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems) { int semId; semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection); if (semId < 0) { /* * Fail quietly if error indicates a collision with existingset. * One would expect EEXIST, given that we said IPC_EXCL, but * perhaps we couldget a permission violation instead? Also, * EIDRM might occur if an old set is slated for destructionbut * not gone yet. */ if (errno == EEXIST || errno == EACCES #ifdef EIDRM || errno == EIDRM #endif ) return -1; /* * Else complain and abort */ ereport(FATAL, (errmsg("could not create semaphores: %m"), errdetail("Failed systemcall was semget(%d, %d, 0%o).", (int) semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection), (errno ==ENOSPC) ? errhint("This error does *not* mean that you have run out of disk space.\n" "It occurs when either the system limit for the maximum number of " "semaphore sets (SEMMNI), or the system wide maximum number of " "semaphores (SEMMNS), would beexceeded. You need to raise the " "respective kernel parameter. Alternatively,reduce PostgreSQL's " "consumption of semaphores by reducingits max_connections parameter " "(currently %d).\n" "The PostgreSQL documentation contains more information about " "configuringyour system for PostgreSQL.", MaxBackends) : 0)); } return semId; } ======== ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Sun, 2 Apr 2006, Tom Lane wrote: >> BTW, even before doing that, you should look at "ipcs -s" output to try >> to get a clue what's going on. The EINVAL failures may be because the >> second postmaster to start deletes the semaphores created by the first >> one. You could easily see this happening in before-and-after ipcs data >> if so. > You are right ... OK, could we see strace (or whatever BSD calls it) output for the second postmaster? I'd like to see exactly what results it's getting for the kernel calls it makes during IpcSemaphoreCreate. regards, tom lane
On Sun, 2 Apr 2006, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: >> On Sun, 2 Apr 2006, Tom Lane wrote: >>> BTW, even before doing that, you should look at "ipcs -s" output to try >>> to get a clue what's going on. The EINVAL failures may be because the >>> second postmaster to start deletes the semaphores created by the first >>> one. You could easily see this happening in before-and-after ipcs data >>> if so. > >> You are right ... > > OK, could we see strace (or whatever BSD calls it) output for the second > postmaster? I'd like to see exactly what results it's getting for the > kernel calls it makes during IpcSemaphoreCreate. 'k, dont' know what strace is ... we have ktrace and truss ... truss is what I usually use, and is: DESCRIPTION The truss utility traces the system calls called by the specified process or program. Output is to thespecified output file, or standard error by default. It does this by stopping and restarting the process being moni- tored via procfs(5). And shows output like: # truss ls ioctl(1,TIOCGETA,0x7fbff514) = 0 (0x0) ioctl(1,TIOCGWINSZ,0x7fbff588) = 0 (0x0) getuid() = 0 (0x0) readlink("/etc/malloc.conf",0x7fbff470,63) ERR#2 'No such file or directory' mmap(0x0,4096,0x3,0x1002,-1,0x0) = 671666176 (0x2808d000) break(0x809b000) = 0 (0x0) break(0x809c000) = 0 (0x0) break(0x809d000) = 0 (0x0) break(0x809e000) = 0 (0x0) stat(".",0x7fbff470) = 0 (0x0) open(".",0x0,00) = 3 (0x3) fchdir(0x3) = 0 (0x0) open(".",0x0,00) = 4 (0x4) stat(".",0x7fbff430) = 0 (0x0) open(".",0x4,00) = 5 (0x5) fstat(5,0x7fbff430) = 0 (0x0) fcntl(0x5,0x2,0x1) = 0 (0x0) __sysctl(0x7fbff2e8,0x2,0x8098760,0x7fbff2e4,0x0,0x0) = 0 (0x0) fstatfs(0x5,0x7fbff330) = 0 (0x0) break(0x809f000) = 0 (0x0) getdirentries(0x5,0x809e000,0x1000,0x809a0b4) = 512 (0x200) getdirentries(0x5,0x809e000,0x1000,0x809a0b4) = 0 (0x0) lseek(5,0x0,0) = 0 (0x0) close(5) = 0 (0x0) fchdir(0x4) = 0 (0x0) close(4) = 0 (0x0) fstat(1,0x7fbff270) = 0 (0x0) break(0x80a0000) = 0 (0x0) ioctl(1,TIOCGETA,0x7fbff2a4) = 0 (0x0) .cshrc .cvspass .history .login .psql_history .ssh write(1,0x809f000,53) = 53 (0x35) .cshrc~ .emacs.d .klogin .profile .rnd ktrace.out write(1,0x809f000,53) = 53 (0x35) exit(0x0) process exit, rval = 0 ktrace is: DESCRIPTION The ktrace utility enables kernel trace logging for the specified pro- cesses. Kernel trace data islogged to the file ktrace.out. The kernel operations that are traced include system calls, namei translations, sig- nal processing, and I/O. And shows output like: 86523 ls RET __sysctl 0 86523 ls CALL fstatfs(0x5,0x7fbff330) 86523 ls RET fstatfs 0 86523 ls CALL break(0x809f000) 86523 ls RET break 0 86523 ls CALL getdirentries(0x5,0x809e000,0x1000,0x809a0b4)86523 ls RET getdirentries 512/0x200 86523 ls CALL getdirentries(0x5,0x809e000,0x1000,0x809a0b4)86523 ls RET getdirentries 0 86523 ls CALL lseek(0x5,0,0,0,0) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Sun, 2 Apr 2006, Tom Lane wrote: >> OK, could we see strace (or whatever BSD calls it) output for the second >> postmaster? I'd like to see exactly what results it's getting for the >> kernel calls it makes during IpcSemaphoreCreate. > 'k, dont' know what strace is ... we have ktrace and truss ... truss is > what I usually use, and is: truss seems to have an output format closer to what I'm used to, but either will do. regards, tom lane
Sent offlist ... On Sun, 2 Apr 2006, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: >> On Sun, 2 Apr 2006, Tom Lane wrote: >>> OK, could we see strace (or whatever BSD calls it) output for the second >>> postmaster? I'd like to see exactly what results it's getting for the >>> kernel calls it makes during IpcSemaphoreCreate. > >> 'k, dont' know what strace is ... we have ktrace and truss ... truss is >> what I usually use, and is: > > truss seems to have an output format closer to what I'm used to, but > either will do. > > regards, tom lane > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > 'k, try this one ... looks better, actually has semget() calls in it :) OK, here's our problem: 84250: semget(0x52e2c1,0x11,0x780) ERR#17 'File exists' This is InternalIpcSemaphoreCreate failing because of key collision. As it should. 84250: semget(0x52e2c1,0x11,0x0) = 1114112 (0x110000) This is IpcSemaphoreCreate trying to see what's up. OK. 84250: __semctl(0x110000,0x10,0x5,0x0) = 537 (0x219) IpcSemaphoreGetValue indicates it has the right "magic number" to be a Postgres semaphore set. Still expected. 84250: __semctl(0x110000,0x10,0x4,0x0) = 83699 (0x146f3) IpcSemaphoreGetLastPID says the sema set is last touched by pid 83699. Looks reasonable (but do you want to double check that that matched the first postmaster's PID?) 84250: getpid() = 84250 (0x1491a) our pid ... as expected ... 84250: kill(0x146f3,0x0) ERR#3 'No such process' Oops. Here is the problem: kill() is lying by claiming there is no such process as 83699. It looks to me like there in fact is such a process, but it's in a different jail. I venture that FBSD 6 has decided to return ESRCH (no such process) where FBSD 4 returned some other error that acknowledged that the process did exist (EPERM would be a reasonable guess). If this is the story, then FBSD have broken their system and must revert their change. They do not have kernel behavior that totally hides the existence of the other process, and therefore having some calls that pretend it's not there is simply inconsistent. regards, tom lane
Kris Kennaway <kris@obsecurity.org> writes: > On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote: >> If this is the story, then FBSD have broken their system and must revert >> their change. They do not have kernel behavior that totally hides the >> existence of the other process, and therefore having some calls that >> pretend it's not there is simply inconsistent. > I'm guessing it's a deliberate change to prevent the information > leakage between jails. I have no objection to doing that, so long as you are actually doing it correctly. This example shows that each jail must have its own SysV semaphore key space, else information leaks anyway. The current situation breaks Postgres, and therefore I suggest reverting the errno change until you are prepared to fix the SysV IPC stuff to be per-jail. regards, tom lane
Kris Kennaway <kris@obsecurity.org> writes: > On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote: >> I have no objection to doing that, so long as you are actually doing it >> correctly. This example shows that each jail must have its own SysV >> semaphore key space, else information leaks anyway. > By default SysV shared memory is disallowed in jails. Hm, the present problem seems to be about semaphores not shared memory ... although I'd not be surprised to find that there's a similar issue around shared memory. Anyway, if FBSD's position is that they are uninterested in supporting SysV IPC in connection with jails, then I think the Postgres project position has to be that we are uninterested in supporting Postgres inside FBSD jails. Sorry Marc :-( regards, tom lane
On Sun, 2 Apr 2006, Kris Kennaway wrote: > On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote: >> Kris Kennaway <kris@obsecurity.org> writes: >>> On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote: >>>> If this is the story, then FBSD have broken their system and must revert >>>> their change. They do not have kernel behavior that totally hides the >>>> existence of the other process, and therefore having some calls that >>>> pretend it's not there is simply inconsistent. >> >>> I'm guessing it's a deliberate change to prevent the information >>> leakage between jails. >> >> I have no objection to doing that, so long as you are actually doing it >> correctly. This example shows that each jail must have its own SysV >> semaphore key space, else information leaks anyway. > > By default SysV shared memory is disallowed in jails. 'k, but how do I fix kill so that it has the proper behaviour if SysV is enabled? Maybe a mount option for procfs that allows for pre-5.x behaviour? I'm not the first one to point out that this is a problem, just the first to follow it through to the cause ;( And I believe there is more then just PostgreSQL that is affected by shared memory (ie. apache2 needs SysV IPC enabled, so anyone doing that in a jail has it enabled also) ... Basically, I don't care if 'default' is ultra-secure ... but some means to bring it down a notch would be nice ... :( ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Sun, 2 Apr 2006, Kris Kennaway wrote: > No-one is taking a position of being "uninterested", so please don't > be hasty to reciprocate. I just posted it off the -hackers list, but there is an ancient patch in the FreeBSD queue for implementing Private IPCs for Jails ... not sure why it was never committed, or what is involved in bring it up to speed with the current 6.x and / or -current kernels though ... but, as I mentioned in another thread, I know that *at least* Apache2 makes use of shared memory for some of its stuff ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Thanks all ... have moved this to just the freebsd-stable list, since I don't imagine most here are interested in FreeBSD :( On Mon, 3 Apr 2006, Andrew Thompson wrote: > On Sun, Apr 02, 2006 at 11:41:01PM -0400, Kris Kennaway wrote: >> On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote: >>> 'k, but how do I fix kill so that it has the proper behaviour if SysV is >>> enabled? >> >> Check the source, perhaps there's already a way. If not, talk to >> whoever made the change. >> >>> Maybe a mount option for procfs that allows for pre-5.x >>> behaviour? >> >> procfs has nothing to do with this though. >> >>> I'm not the first one to point out that this is a problem, just >>> the first to follow it through to the cause ;( And I believe there is >>> more then just PostgreSQL that is affected by shared memory (ie. apache2 >>> needs SysV IPC enabled, so anyone doing that in a jail has it enabled >>> also) ... >> >> Also note that SysV IPC is not the problem here, it's the change in >> the behaviour of kill() that is causing postgresql to become confused. >> That's what you should investigate. > > The ESRCH error is being returned from prison_check(), that would be a > good starting place. > > > Andrew > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Robert Watson <rwatson@FreeBSD.org> writes: > However, pid's in general uniquely identify a process only at the time they > are recorded. So any pid returned here is necessarily stale -- even if there > is another process with the pid returned by GETPID, it may actually be a > different process that has ended up with the same pid. The longer the gap > since the last semaphore operation, the more likely (presumably) it is that > the pid has been recycled. And on modern systems with thousands of processes > and high process turn-over (i.e., systems with CGI and other sorts of > scripting),pid reuse can happen quickly. Is your use of the pid here > consistent with fact that pid's are reused quickly after process exit? That's a fair question, but in the context of the code I believe we are behaving reasonably. The reason this code exists is to provide some insurance against leaking semaphores when a postmaster process is terminated unexpectedly (ye olde often-recommended-against "kill -9 postmaster", for instance). If the PID returned by GETPID is nonexistent or belongs to a process not owned by the postgres userid then we assume that the semaphore set can be recycled. We could get fooled by PID recycling if the PID returned by GETPID belongs to a postgres-owned process that isn't actually the original owner, but the penalty is just that we'll fail to recycle semaphores that could be released. Not very harmful, and not very probable either, unless you're running postgres under a userid that's used for a lot of other stuff too. There is not much risk of long-term leakage of many semaphore sets, even if you've got lots of postmaster crashes going on (which I sure hope you don't). The code is designed to retry the same semaphore keys on each cycle of life, so you'd have to get fooled by chance coincidence of existing PIDs every time over many cycles to have a severe resource-leakage problem. (BTW, Marc, that's the reason for *not* randomizing the key selection as you suggested.) So I think the code is pretty bulletproof as long as it's in a system that is behaving per SysV spec. The problem in the current FBSD situation is that the jail mechanism is exposing semaphore sets across jails, but not exposing the existence of the owning processes. That behavior is inconsistent: if process A can affect the state of a sema set that process B can see, it's surely unreasonable to pretend that A doesn't exist. regards, tom lane
Robert Watson <rwatson@FreeBSD.org> writes: > Maybe I've misunderstood the problem here -- is the use of the GETPID > operation occuring within a coordinated set of server processes, or does it > also occur between client and server processes? I think it's quite reasonable > to argue that a coordinated set of server processes should be able to see each > other, especially if they're running as the same user, in the same jail, > started at the same time. We use the semaphore sets only within postgres server processes; we could hardly expect client processes to be able to get at them, since in general clients aren't on the same machine. The issue here, though, is that Marc is trying to start multiple postgres servers in different jails, and in that context the different postgres servers aren't "coordinated" in any real sense. We'd prefer that they didn't interact at all, but they are interacting because the SysV code isn't restricting IPC to occur only within a jail. BTW, Marc, it occurs to me that a workaround for you would be to create a separate userid for postgres to run under in each jail; then the regular protection mechanisms would prevent the different postmasters from interfering with each others' semaphore sets. But I think that workaround just makes it even clearer that the jail mechanism isn't behaving very sanely. > I would, in general, consider the use of System > V IPC across jails (as opposed to in a single jail) unsupported, since it's > not consistent with the security model. That'd be fine with me --- the problem here is that we've got unwanted communication across jails. If, say, the jail ID were considered part of semaphore keys, we'd be in fine shape. regards, tom lane
Robert Watson <rwatson@FreeBSD.org> writes: > Any multi-instance application that uses unvirtualized System V IPC must know > how to distinguish between those instances. Sure. > How is PostgreSQL deciding what semaphores to use? Can it be instructed to > use non-colliding ones by specifying an alternative argument to pass to > ftok(), or ID to use directly? The problem here is not that we don't know how to avoid a collision. The problem is stemming from code that we added to prevent semaphore leakage during failure recoveries. The code believes that it is deleting a semaphore set left over from a crashed previous instance of the same postmaster. We don't use ftok() to determine the keys, and I'm disinclined to think that doing so would improve the situation: you could still have key collisions, they'd just be unpredictable and there'd be no convenient mechanism for escaping one if you hit it. > However, if applications behave incorrectly when treading over each other > because either they aren't written to support specifying how not to walk over > each other, or if they are not configured to use that support, then they're > not going to behave well :-). Postgres is absolutely designed not to walk all over itself. It is, however, designed to clean up after itself, and I don't consider that a bug. The problem here is that by redefining the behavior of kill, you've prevented the code from detecting the existence of the other postmaster, and thereby triggered the cleanup behavior. I don't exactly see why it's considered such a critical security feature that kill return ESRCH rather than, say, EPERM for processes in another jail. kill won't tell you what that process is or what it's doing, so the amount of information leaked is certainly pretty trivial. It'd be fine if FBSD actually had a jail implementation that leaked zero information, but you don't --- in fact, you're saying it's a feature that you don't. Perhaps a reasonable compromise would be to have the SysV-IPC-allowed-in-jails switch also restore the normal return value of kill(). This seems sensible to me because the GETPID feature makes PIDs be part of the API that is exposed across jails. regards, tom lane
On Apr 3, 2006, at 12:37 PM, Tom Lane wrote: > semaphore keys on each cycle of life, so you'd have to get fooled by > chance coincidence of existing PIDs every time over many cycles to > have a severe resource-leakage problem. (BTW, Marc, that's the reason > for *not* randomizing the key selection as you suggested.) Seems to me the way around this with minimal fuss is to add a flag to postgres to have it start at different points in the ID sequence. So pg#1 would start at first position, pg#2 second ID position, etc. then just hard-code an "instance ID" into the startup script for each pg. No randomization make it easier to debug, and unique IDs make it avoid clashes under normal cases.
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > That's a fair question, but in the context of the code I believe we are > behaving reasonably. The reason this code exists is to provide some > insurance against leaking semaphores when a postmaster process is > terminated unexpectedly (ye olde often-recommended-against "kill -9 > postmaster", for instance). If the PID returned by GETPID is Could this be handled sensibly by using SEM_UNDO? Just a thought. > So I think the code is pretty bulletproof as long as it's in a system > that is behaving per SysV spec. The problem in the current FBSD > situation is that the jail mechanism is exposing semaphore sets across > jails, but not exposing the existence of the owning processes. That > behavior is inconsistent: if process A can affect the state of a sema > set that process B can see, it's surely unreasonable to pretend that A > doesn't exist. This is certainly a problem with FBSD jails... Not only the inconsistancy, but what happens if someone manages to get access to the appropriate uid under one jail and starts sniffing or messing with the semaphores or shared memory segments from other jails? If that's possible then that's a rather glaring security problem... Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > BTW, Marc, it occurs to me that a workaround for you would be to create > a separate userid for postgres to run under in each jail; then the > regular protection mechanisms would prevent the different postmasters > from interfering with each others' semaphore sets. But I think that > workaround just makes it even clearer that the jail mechanism isn't > behaving very sanely. Just to toss it in there, I do this on some systems where we use Linux VServers. It's just so that when I'm looking at a process list across the whole system it's easy to tell which processes are inside which vservers (since the only thing which should be running in a given vserver is a single Postgres instance which should only be running with the uid/gid corresponding to that vserver, and that uid/gid is recorded in the host passwd file with a name associated with it since that's the passwd file used when looking at all pids). I also just double-checked with the Linux VServer folks and they confirm that IPC inside the vserver are isolated from all the other IPCs on the system. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Could this be handled sensibly by using SEM_UNDO? Just a thought. Interesting thought, but I think it doesn't work for the special case where the semaphore's "previous owner" is actually our same PID --- which is actually the more commonly exercised path, since true postmaster crashes are pretty rare. More commonly we're trying to detach from and recreate our own shmem and semas following a backend crash. We can special-case that pretty easily with the GETPID solution (pid == ours is obviously not some other process's sema), but with SEM_UNDO it wouldn't work right. I'm also concerned about the portability risks of depending on SEM_UNDO. I think a lot of systems set the semaphore-undo limits pretty small, maybe even zero. BTW, as long as we're annoying the freebsd-stable list with discussions of workarounds, I'm wondering whether our shared memory code might have similar risks. Does FBSD 6 also lie about the existence of other-jail processes connected to a shared memory segment --- ie, in shmctl(IPC_STAT)'s result, does shm_nattch count only processes in our own jail? regards, tom lane
* Robert Watson (rwatson@FreeBSD.org) wrote: > On Mon, 3 Apr 2006, Stephen Frost wrote: > >This is certainly a problem with FBSD jails... Not only the > >inconsistancy, but what happens if someone manages to get access to the > >appropriate uid under one jail and starts sniffing or messing with the > >semaphores or shared memory segments from other jails? If that's possible > >then that's a rather glaring security problem... > > This is why it's disabled by default, and the jail documentation > specifically advises of this possibility. Excerpt below. Ah, I see, glad to see it's accurately documented. Given the rather significant use of shared memory by Postgres it seems to me that jail'ing it under FBSD is unlikely to get you the kind of isolation between instances that you want (the assumption being that you want to avoid the possibility of a user under one jail impacting a user in another jail). As such, I'd suggest finding something else if you truely need that isolation for Postgres or dropping the jails entirely. Running the Postgres instances under different uids (as you'd probably expect to do anyway if not using the jails) is probably the right approach. Doing that and using jails would probably work, just don't delude yourself into thinking that you're safe from a malicious user in one jail. Thanks, Stephen
On Mon, 3 Apr 2006, Stephen Frost wrote: > * Robert Watson (rwatson@FreeBSD.org) wrote: >> On Mon, 3 Apr 2006, Stephen Frost wrote: >>> This is certainly a problem with FBSD jails... Not only the >>> inconsistancy, but what happens if someone manages to get access to the >>> appropriate uid under one jail and starts sniffing or messing with the >>> semaphores or shared memory segments from other jails? If that's possible >>> then that's a rather glaring security problem... >> >> This is why it's disabled by default, and the jail documentation >> specifically advises of this possibility. Excerpt below. > > Ah, I see, glad to see it's accurately documented. Given the rather > significant use of shared memory by Postgres it seems to me that > jail'ing it under FBSD is unlikely to get you the kind of isolation > between instances that you want (the assumption being that you want to > avoid the possibility of a user under one jail impacting a user in > another jail). As such, I'd suggest finding something else if you > truely need that isolation for Postgres or dropping the jails entirely. > > Running the Postgres instances under different uids (as you'd probably > expect to do anyway if not using the jails) is probably the right > approach. Doing that and using jails would probably work, just don't > delude yourself into thinking that you're safe from a malicious user in > one jail. We don't ... we put all our databases on a central database server, even private ones, that nobody has shell access to ... we keep them isolated ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
* Marc G. Fournier (scrappy@postgresql.org) wrote: > On Mon, 3 Apr 2006, Stephen Frost wrote: > >Running the Postgres instances under different uids (as you'd probably > >expect to do anyway if not using the jails) is probably the right > >approach. Doing that and using jails would probably work, just don't > >delude yourself into thinking that you're safe from a malicious user in > >one jail. > > We don't ... we put all our databases on a central database server, even > private ones, that nobody has shell access to ... we keep them isolated > ... I guess what I was trying to get at is this: Running 2 Postgres instances under FreeBSD with (or without really, but I guess that's more obvious) jails but with the same UID is a bad idea. Even if Postgres could be modified to allow this to work you're going to be in a position where the jail isn't really helping much except to give a somewhat false (in this case) sense of security. We probably shouldn't encourage it and in fact it's something of a nice feature that it breaks. The reasoning is pretty simple: if someone manages to get control of one of the Postgres instances they're going to be able to wreck havoc on the other. With different UIDs, with or without jails, this would be much more difficult (need to get root first). Running 2 Postgres instances under FreeBSD with jails *and* different UIDs is *probably* better than w/o jails but since you have to enable the single-instance IPC system it might not be that great of a benefit over a simple chroot or similar. Hope that helps... Thanks, Stephen
On Sun, Apr 02, 2006 at 11:26:52PM -0400, Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> writes: > > On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote: > >> I have no objection to doing that, so long as you are actually doing it > >> correctly. This example shows that each jail must have its own SysV > >> semaphore key space, else information leaks anyway. > > > By default SysV shared memory is disallowed in jails. > > Hm, the present problem seems to be about semaphores not shared memory Sorry, I meant IPC. > ... although I'd not be surprised to find that there's a similar issue > around shared memory. Anyway, if FBSD's position is that they are > uninterested in supporting SysV IPC in connection with jails, then I > think the Postgres project position has to be that we are uninterested > in supporting Postgres inside FBSD jails. No-one is taking a position of being "uninterested", so please don't be hasty to reciprocate. Kris
On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> writes: > > On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote: > >> If this is the story, then FBSD have broken their system and must revert > >> their change. They do not have kernel behavior that totally hides the > >> existence of the other process, and therefore having some calls that > >> pretend it's not there is simply inconsistent. > > > I'm guessing it's a deliberate change to prevent the information > > leakage between jails. > > I have no objection to doing that, so long as you are actually doing it > correctly. This example shows that each jail must have its own SysV > semaphore key space, else information leaks anyway. By default SysV shared memory is disallowed in jails. Kris
On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote: > I venture that FBSD 6 has decided to return ESRCH (no such process) > where FBSD 4 returned some other error that acknowledged that the > process did exist (EPERM would be a reasonable guess). > > If this is the story, then FBSD have broken their system and must revert > their change. They do not have kernel behavior that totally hides the > existence of the other process, and therefore having some calls that > pretend it's not there is simply inconsistent. I'm guessing it's a deliberate change to prevent the information leakage between jails. Kris
On Sun, Apr 02, 2006 at 11:41:01PM -0400, Kris Kennaway wrote: > On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote: > > 'k, but how do I fix kill so that it has the proper behaviour if SysV is > > enabled? > > Check the source, perhaps there's already a way. If not, talk to > whoever made the change. > > > Maybe a mount option for procfs that allows for pre-5.x > > behaviour? > > procfs has nothing to do with this though. > > > I'm not the first one to point out that this is a problem, just > > the first to follow it through to the cause ;( And I believe there is > > more then just PostgreSQL that is affected by shared memory (ie. apache2 > > needs SysV IPC enabled, so anyone doing that in a jail has it enabled > > also) ... > > Also note that SysV IPC is not the problem here, it's the change in > the behaviour of kill() that is causing postgresql to become confused. > That's what you should investigate. The ESRCH error is being returned from prison_check(), that would be a good starting place. Andrew
On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote: > On Sun, 2 Apr 2006, Kris Kennaway wrote: > > >On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote: > >>Kris Kennaway <kris@obsecurity.org> writes: > >>>On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote: > >>>>If this is the story, then FBSD have broken their system and must revert > >>>>their change. They do not have kernel behavior that totally hides the > >>>>existence of the other process, and therefore having some calls that > >>>>pretend it's not there is simply inconsistent. > >> > >>>I'm guessing it's a deliberate change to prevent the information > >>>leakage between jails. > >> > >>I have no objection to doing that, so long as you are actually doing it > >>correctly. This example shows that each jail must have its own SysV > >>semaphore key space, else information leaks anyway. > > > >By default SysV shared memory is disallowed in jails. > > 'k, but how do I fix kill so that it has the proper behaviour if SysV is > enabled? Check the source, perhaps there's already a way. If not, talk to whoever made the change. > Maybe a mount option for procfs that allows for pre-5.x > behaviour? procfs has nothing to do with this though. > I'm not the first one to point out that this is a problem, just > the first to follow it through to the cause ;( And I believe there is > more then just PostgreSQL that is affected by shared memory (ie. apache2 > needs SysV IPC enabled, so anyone doing that in a jail has it enabled > also) ... Also note that SysV IPC is not the problem here, it's the change in the behaviour of kill() that is causing postgresql to become confused. That's what you should investigate. Kris
On Sun, 2 Apr 2006, Tom Lane wrote: > Oops. Here is the problem: kill() is lying by claiming there is no such > process as 83699. It looks to me like there in fact is such a process, but > it's in a different jail. > > I venture that FBSD 6 has decided to return ESRCH (no such process) where > FBSD 4 returned some other error that acknowledged that the process did > exist (EPERM would be a reasonable guess). > > If this is the story, then FBSD have broken their system and must revert > their change. They do not have kernel behavior that totally hides the > existence of the other process, and therefore having some calls that pretend > it's not there is simply inconsistent. FreeBSD's mandatory access control models, such as multi-level security, biba integrity, and type enforcement, will generally provide consistent protection under the circumstances you describe: specifically, that information flow invariants across IPC types, including System V IPC and inter-process signalling, will allow flow only in keeping with the policy. However, I guess I would counter with the following concern: the PID returned by semctl() has the following definition: GETPID Return the pid of the last process to perform an operation on semaphore number semnum. However, pid's in general uniquely identify a process only at the time they are recorded. So any pid returned here is necessarily stale -- even if there is another process with the pid returned by GETPID, it may actually be a different process that has ended up with the same pid. The longer the gap since the last semaphore operation, the more likely (presumably) it is that the pid has been recycled. And on modern systems with thousands of processes and high process turn-over (i.e., systems with CGI and other sorts of scripting),pid reuse can happen quickly. Is your use of the pid here consistent with fact that pid's are reused quickly after process exit? Use of pid's in UNIX is often unreliable, and must be combined with other synchronizing, such as file locking on a pidfile, to ensure that the pid read is valid. Even then, you can't implement atomic check-pid-and-signal using current UNIX APIs, which would require a notion of a process handle (or, in the parlance of Mach, a task port). Another thought along these lines -- especially with the proliferation of fine-grained access control systems, such as Type Enforcement in SELinux, I would be cautious about assuming that two processes being able to manipulate the same sempahore implies the ability to exchange signals using the signal facility. Robert N M Watson
On Mon, 3 Apr 2006, Tom Lane wrote: > BTW, as long as we're annoying the freebsd-stable list with discussions of > workarounds, I'm wondering whether our shared memory code might have similar > risks. Does FBSD 6 also lie about the existence of other-jail processes > connected to a shared memory segment --- ie, in shmctl(IPC_STAT)'s result, > does shm_nattch count only processes in our own jail? People are, of course, welcome to read the Jail documentation in order to read the warning about not enabling the System V IPC support in Jails, and what the possible results of doing so are. Robert N M Watson
On Mon, 3 Apr 2006, Stephen Frost wrote: >> This is why it's disabled by default, and the jail documentation >> specifically advises of this possibility. Excerpt below. > > Ah, I see, glad to see it's accurately documented. As it has been for the last five years, I believe since introduction of the setting to allow System V IPC to be used with documented limitations. > Given the rather significant use of shared memory by Postgres it seems to me > that jail'ing it under FBSD is unlikely to get you the kind of isolation > between instances that you want (the assumption being that you want to avoid > the possibility of a user under one jail impacting a user in another jail). > As such, I'd suggest finding something else if you truely need that > isolation for Postgres or dropping the jails entirely. > > Running the Postgres instances under different uids (as you'd probably > expect to do anyway if not using the jails) is probably the right approach. > Doing that and using jails would probably work, just don't delude yourself > into thinking that you're safe from a malicious user in one jail. Yes, there seems to be an awful lot of noise being made about the fact that the system does, in fact, work exactly as documented, and that the configuration being complained about is one that is specifically documented as being unsupported and undesirable. As commented elsewhere in this thread, currently, there is no virtualization support for System V IPC in the FreeBSD Jail implementation. That may change if/when someone implements it. Until it's implemented, it isn't going to be there, and the system won't behave as though it's there no matter how much jumping up and down is done. Robert N M Watson
On Mon, 3 Apr 2006, Tom Lane wrote: > Robert Watson <rwatson@FreeBSD.org> writes: >> Any multi-instance application that uses unvirtualized System V IPC must know >> how to distinguish between those instances. > > Sure. > >> How is PostgreSQL deciding what semaphores to use? Can it be instructed to >> use non-colliding ones by specifying an alternative argument to pass to >> ftok(), or ID to use directly? > > The problem here is not that we don't know how to avoid a collision. The > problem is stemming from code that we added to prevent semaphore leakage > during failure recoveries. The code believes that it is deleting a > semaphore set left over from a crashed previous instance of the same > postmaster. > > We don't use ftok() to determine the keys, and I'm disinclined to think that > doing so would improve the situation: you could still have key collisions, > they'd just be unpredictable and there'd be no convenient mechanism for > escaping one if you hit it. I guess what I'm saying is this: by turning on system V IPC in a jail, administrators accept that they are using an unsupported configuration, in which the security features of jail, which include hiding the process state of other jails, are known to conflict with the System V IPC services. We specifically disable System V IPC in jails because it is known to have undesirable properties. When configuring systems in that state, the responsibility falls on the administrator to disambiguate the configuration by specifying which resources must be used in order to prevent a conflict, because software operating in that environment will not be able to do so properly. The goal of the switch to enable System V IPC is to allow IPC to be enabled for a single jail at a time, where it can be used to its full capabilities, without violating the security model. If it is turned on for more than one jail, then isolation is not provided for System V IPC. So my recommendation is, if people want to run Postgres in more than one jail at a time, they be provided with a configuration option to disambiguate which semaphore to use: they must hard-code that it will not use the same sempahore already in use by another Postgres instance in another Jail. This is no different than specifying that if there are multiple Apache's running on a single system, that they run on different port/IP combinations. If they aren't configured to do so, one of them will encounter an error when running, because the resource is already in use, and you may get unpredictable results if the two Apaches are started at the same time, restarted, etc, as they will race to acquire the resource. Whether you pull the resource ID out of a hat, use ftok(), or whatever, I really mind, and have no strong opinion. The name space of System V IPC is one of the known problems with the IPC model, and sadly, one accepts those problems by using those IPC mechanisms. Robert N M Watson
On Mon, 3 Apr 2006, Stephen Frost wrote: >> So I think the code is pretty bulletproof as long as it's in a system that >> is behaving per SysV spec. The problem in the current FBSD situation is >> that the jail mechanism is exposing semaphore sets across jails, but not >> exposing the existence of the owning processes. That behavior is >> inconsistent: if process A can affect the state of a sema set that process >> B can see, it's surely unreasonable to pretend that A doesn't exist. > > This is certainly a problem with FBSD jails... Not only the inconsistancy, > but what happens if someone manages to get access to the appropriate uid > under one jail and starts sniffing or messing with the semaphores or shared > memory segments from other jails? If that's possible then that's a rather > glaring security problem... This is why it's disabled by default, and the jail documentation specifically advises of this possibility. Excerpt below. Robert N M Watson security.jail.sysvipc_allowed This MIB entry determines whether or not processes within a jail haveaccess to System V IPC primitives. In the current jail imple- mentation, System V primitives share a singlenamespace across the host and jail environments, meaning that processes within a jail would be ableto communicate with (and potentially interfere with) processes outside of the jail, and in other jails. Assuch, this functionality is disabled by default, but can be enabled by setting this MIB entry to 1.
On Mon, Apr 03, 2006 at 06:51:45PM -0400, Stephen Frost wrote: > * Robert Watson (rwatson@FreeBSD.org) wrote: > > On Mon, 3 Apr 2006, Stephen Frost wrote: > > >This is certainly a problem with FBSD jails... Not only the > > >inconsistancy, but what happens if someone manages to get access to the > > >appropriate uid under one jail and starts sniffing or messing with the > > >semaphores or shared memory segments from other jails? If that's possible > > >then that's a rather glaring security problem... > > > > This is why it's disabled by default, and the jail documentation > > specifically advises of this possibility. Excerpt below. > > Ah, I see, glad to see it's accurately documented. Given the rather > significant use of shared memory by Postgres it seems to me that > jail'ing it under FBSD is unlikely to get you the kind of isolation > between instances that you want (the assumption being that you want to > avoid the possibility of a user under one jail impacting a user in > another jail). As such, I'd suggest finding something else if you > truely need that isolation for Postgres or dropping the jails entirely. > > Running the Postgres instances under different uids (as you'd probably > expect to do anyway if not using the jails) is probably the right > approach. Doing that and using jails would probably work, just don't > delude yourself into thinking that you're safe from a malicious user in > one jail. Yes; however jails are still useful for administrative compartmentalization even when you have to weaken their security properties, such as here. Kris
On Mon, 3 Apr 2006, Tom Lane wrote: > Robert Watson <rwatson@FreeBSD.org> writes: >> Maybe I've misunderstood the problem here -- is the use of the GETPID >> operation occuring within a coordinated set of server processes, or does it >> also occur between client and server processes? I think it's quite reasonable >> to argue that a coordinated set of server processes should be able to see each >> other, especially if they're running as the same user, in the same jail, >> started at the same time. > > We use the semaphore sets only within postgres server processes; we could > hardly expect client processes to be able to get at them, since in general > clients aren't on the same machine. The issue here, though, is that Marc is > trying to start multiple postgres servers in different jails, and in that > context the different postgres servers aren't "coordinated" in any real > sense. We'd prefer that they didn't interact at all, but they are > interacting because the SysV code isn't restricting IPC to occur only within > a jail. > > BTW, Marc, it occurs to me that a workaround for you would be to create a > separate userid for postgres to run under in each jail; then the regular > protection mechanisms would prevent the different postmasters from > interfering with each others' semaphore sets. But I think that workaround > just makes it even clearer that the jail mechanism isn't behaving very > sanely. Any multi-instance application that uses unvirtualized System V IPC must know how to distinguish between those instances. This is true of any potential communication mechanism used by multi-instance applications -- be it a command line argument to specify an alternative configuration file, or a configuration file that specifies alternative ports, working directories, mail spool directories, etc. If you install two instances of sendmail, it requires some configuration to teach them not to step all over each other, and this is not an accident: if they try to use the same mail spools, ports, etc, things will go badly. I can't imagine that PostgreSQL should be any different -- it has to be pointed at what resources to use and how to use them -- some of that will be a property of how it's written, and some how it's configured. Presumably, running multiple instances of PostgreSQL in jails should not be all that different from running multiple instances on any UNIX machine: they must not overlap where shared resources are concerned. How is PostgreSQL deciding what semaphores to use? Can it be instructed to use non-colliding ones by specifying an alternative argument to pass to ftok(), or ID to use directly? >> I would, in general, consider the use of System V IPC across jails (as >> opposed to in a single jail) unsupported, since it's not consistent with >> the security model. > > That'd be fine with me --- the problem here is that we've got unwanted > communication across jails. If, say, the jail ID were considered part of > semaphore keys, we'd be in fine shape. Well, I think it's definitely unwanted communications, but until such time as FreeBSD supports virtualizing the System V IPC name spaces, the fact that you can communicate between jails when System V IPC support is turned on for the jail shouldn't be a surprise, and should in fact be considered a feature. However, if applications behave incorrectly when treading over each other because either they aren't written to support specifying how not to walk over each other, or if they are not configured to use that support, then they're not going to behave well :-). Robert N M Watson
On Mon, Apr 03, 2006 at 03:42:51PM -0400, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > That's a fair question, but in the context of the code I believe we are > > behaving reasonably. The reason this code exists is to provide some > > insurance against leaking semaphores when a postmaster process is > > terminated unexpectedly (ye olde often-recommended-against "kill -9 > > postmaster", for instance). If the PID returned by GETPID is > > Could this be handled sensibly by using SEM_UNDO? Just a thought. > > > So I think the code is pretty bulletproof as long as it's in a system > > that is behaving per SysV spec. The problem in the current FBSD > > situation is that the jail mechanism is exposing semaphore sets across > > jails, but not exposing the existence of the owning processes. That > > behavior is inconsistent: if process A can affect the state of a sema > > set that process B can see, it's surely unreasonable to pretend that A > > doesn't exist. > > This is certainly a problem with FBSD jails... Not only the > inconsistancy, but what happens if someone manages to get access to the > appropriate uid under one jail and starts sniffing or messing with the > semaphores or shared memory segments from other jails? If that's > possible then that's a rather glaring security problem... This was stated already upthread, but sysv IPC is disabled by default in jails for precisely this reason. So yes, when you turn it on it's a potential security problem if your jails are supposed to be compartmentalized. Kris
On Mon, 3 Apr 2006, Tom Lane wrote: > That's a fair question, but in the context of the code I believe we are > behaving reasonably. The reason this code exists is to provide some > insurance against leaking semaphores when a postmaster process is terminated > unexpectedly (ye olde often-recommended-against "kill -9 postmaster", for > instance). If the PID returned by GETPID is nonexistent or belongs to a > process not owned by the postgres userid then we assume that the semaphore > set can be recycled. We could get fooled by PID recycling if the PID > returned by GETPID belongs to a postgres-owned process that isn't actually > the original owner, but the penalty is just that we'll fail to recycle > semaphores that could be released. Not very harmful, and not very probable > either, unless you're running postgres under a userid that's used for a lot > of other stuff too. There is not much risk of long-term leakage of many > semaphore sets, even if you've got lots of postmaster crashes going on > (which I sure hope you don't). The code is designed to retry the same > semaphore keys on each cycle of life, so you'd have to get fooled by chance > coincidence of existing PIDs every time over many cycles to have a severe > resource-leakage problem. (BTW, Marc, that's the reason for *not* > randomizing the key selection as you suggested.) > > So I think the code is pretty bulletproof as long as it's in a system that > is behaving per SysV spec. The problem in the current FBSD situation is > that the jail mechanism is exposing semaphore sets across jails, but not > exposing the existence of the owning processes. That behavior is > inconsistent: if process A can affect the state of a sema set that process B > can see, it's surely unreasonable to pretend that A doesn't exist. Maybe I've misunderstood the problem here -- is the use of the GETPID operation occuring within a coordinated set of server processes, or does it also occur between client and server processes? I think it's quite reasonable to argue that a coordinated set of server processes should be able to see each other, especially if they're running as the same user, in the same jail, started at the same time. After all, coordinated server applications frequently use signals to manage resources and perform asynchronous notification (i.e., SIGCHLD, SIGHUP, etc). If we're talking about clients and servers coordinating using the same System V IPC name space, I find myself less sympathetic to the idea that otherwise unrelated processes on either side of the IPC mechanism should be using out-of-band process operations to test for mutual presence. There has been occasional investigation of virtualizing the System V IPC name space, but as you are no doubt aware, the name space doesn't lend itself to virtualization, as it fails to be conveniently hierarchical, etc. This is just another of the ways in which System V IPC offers quite useful IPC services in less useful ways. I would, in general, consider the use of System V IPC across jails (as opposed to in a single jail) unsupported, since it's not consistent with the security model. However, I have doubts about the behavioral dependency we're talking about above. Robert N M Watson
[ FreeBSD email list removed.] I totally agree, and have added the attached documentation patch to recommend using different users in FreeBSD jails. --------------------------------------------------------------------------- Stephen Frost wrote: -- Start of PGP signed section. > * Marc G. Fournier (scrappy@postgresql.org) wrote: > > On Mon, 3 Apr 2006, Stephen Frost wrote: > > >Running the Postgres instances under different uids (as you'd probably > > >expect to do anyway if not using the jails) is probably the right > > >approach. Doing that and using jails would probably work, just don't > > >delude yourself into thinking that you're safe from a malicious user in > > >one jail. > > > > We don't ... we put all our databases on a central database server, even > > private ones, that nobody has shell access to ... we keep them isolated > > ... > > I guess what I was trying to get at is this: > > Running 2 Postgres instances under FreeBSD with (or without really, but > I guess that's more obvious) jails but with the same UID is a bad idea. > Even if Postgres could be modified to allow this to work you're going to > be in a position where the jail isn't really helping much except to give > a somewhat false (in this case) sense of security. We probably > shouldn't encourage it and in fact it's something of a nice feature that > it breaks. > > The reasoning is pretty simple: if someone manages to get control of > one of the Postgres instances they're going to be able to wreck havoc on > the other. With different UIDs, with or without jails, this would be > much more difficult (need to get root first). > > Running 2 Postgres instances under FreeBSD with jails *and* different > UIDs is *probably* better than w/o jails but since you have to enable > the single-instance IPC system it might not be that great of a benefit > over a simple chroot or similar. > > Hope that helps... > > Thanks, > > Stephen -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.366 diff -c -c -r1.366 runtime.sgml *** doc/src/sgml/runtime.sgml 3 Apr 2006 23:35:02 -0000 1.366 --- doc/src/sgml/runtime.sgml 11 Apr 2006 19:23:27 -0000 *************** *** 764,769 **** --- 764,781 ---- </para> <para> + If running in FreeBSD jails by enabling <application>sysconf</>'s + <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s + running in different jails should be run by different operating system + users. This improves security because it prevents one jail from + interfering with shared memory or semaphores in another, and it + allows the PostgreSQL IPC cleanup code to function properly. + (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect + processes in other jails, preventing the running of postmasters on the + same port in different jails.) + </para> + + <para> <systemitem class="osname">FreeBSD</> versions before 4.0 work like <systemitem class="osname">NetBSD</> and <systemitem class="osname"> OpenBSD</> (see below).
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > <para> > + If running in FreeBSD jails by enabling <application>sysconf</>'s > + <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s > + running in different jails should be run by different operating system > + users. This improves security because it prevents one jail from > + interfering with shared memory or semaphores in another, and it > + allows the PostgreSQL IPC cleanup code to function properly. > + (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect > + processes in other jails, preventing the running of postmasters on the > + same port in different jails.) > + </para> This looks good, my only comment would be that we don't want people to believe that using different users somehow makes the sysv spaces seperate between the jails. It doesn't. Even when using different uids, a user who gets root in one jail would be able to mess with the Postgres instance in the other jail through IPC. Perhaps change: "This improves security because it prevents one jail from interfering with shared memory or semaphores in another" to: "This improves security because it prevents the postgres user in one jail from interfering with shared memory or semaphores owned by a different user in another jail (with BSD jails, root, or the same UID, in any jail can see and interfere with the shared memory and semaphores in any other jail of the same UID, or all if root)" That's still not great but I think it's a little better... Thanks, Stephen
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > <para> > > + If running in FreeBSD jails by enabling <application>sysconf</>'s > > + <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s > > + running in different jails should be run by different operating system > > + users. This improves security because it prevents one jail from > > + interfering with shared memory or semaphores in another, and it > > + allows the PostgreSQL IPC cleanup code to function properly. > > + (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect > > + processes in other jails, preventing the running of postmasters on the > > + same port in different jails.) > > + </para> > > This looks good, my only comment would be that we don't want people to > believe that using different users somehow makes the sysv spaces > seperate between the jails. It doesn't. Even when using different > uids, a user who gets root in one jail would be able to mess with the > Postgres instance in the other jail through IPC. > > Perhaps change: > > "This improves security because it prevents one jail from > interfering with shared memory or semaphores in another" > > to: > > "This improves security because it prevents the postgres user in one > jail from interfering with shared memory or semaphores owned by a > different user in another jail (with BSD jails, root, or the same > UID, in any jail can see and interfere with the shared memory and > semaphores in any other jail of the same UID, or all if root)" > > That's still not great but I think it's a little better... I updated the wording to say 'non-root users': If running in FreeBSD jails by enabling <application>sysconf</>'s <literal>security.jail.sysvipc_allowed</>,<application>postmaster</>s running in different jails should be run by differentoperating system users. This improves security because it prevents non-root users from interferingwith shared memory or semaphores in a different jail, and it allows the PostgreSQL IPC cleanup code to functionproperly. (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect processes in other jails,preventing the running of postmasters on the same port in different jails.) -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > I updated the wording to say 'non-root users': > > If running in FreeBSD jails by enabling <application>sysconf</>'s > <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s > running in different jails should be run by different operating system > users. This improves security because it prevents non-root users > from interfering with shared memory or semaphores in a different jail, > and it allows the PostgreSQL IPC cleanup code to function properly. > (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect > processes in other jails, preventing the running of postmasters on the > same port in different jails.) You're still saying it'll do something that it won't... It doesn't prevent non-root users from messing with each other if they're the same UID, even if they're under different jails... That's the whole problem here. :) Thanks, Stephen
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > I updated the wording to say 'non-root users': > > > > If running in FreeBSD jails by enabling <application>sysconf</>'s > > <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s > > running in different jails should be run by different operating system > > users. This improves security because it prevents non-root users > > from interfering with shared memory or semaphores in a different jail, > > and it allows the PostgreSQL IPC cleanup code to function properly. > > (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect > > processes in other jails, preventing the running of postmasters on the > > same port in different jails.) > > You're still saying it'll do something that it won't... It doesn't > prevent non-root users from messing with each other if they're the same > UID, even if they're under different jails... That's the whole problem > here. :) Uh, the first part says use different Unix users for different jails, then it says why to do that (security). Seems clear to me. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi! On Mon, Apr 03, 2006 at 11:56:13PM +0100, Robert Watson wrote: > >>This is why it's disabled by default, and the jail documentation > >>specifically advises of this possibility. Excerpt below. > > > >Ah, I see, glad to see it's accurately documented. > > As it has been for the last five years, I believe since introduction of the > setting to allow System V IPC to be used with documented limitations. > > >Given the rather significant use of shared memory by Postgres it seems to > >me that jail'ing it under FBSD is unlikely to get you the kind of > >isolation between instances that you want (the assumption being that you > >want to avoid the possibility of a user under one jail impacting a user in > >another jail). As such, I'd suggest finding something else if you truely > >need that isolation for Postgres or dropping the jails entirely. > > > >Running the Postgres instances under different uids (as you'd probably > >expect to do anyway if not using the jails) is probably the right > >approach. Doing that and using jails would probably work, just don't > >delude yourself into thinking that you're safe from a malicious user in > >one jail. > > Yes, there seems to be an awful lot of noise being made about the fact that > the system does, in fact, work exactly as documented, and that the > configuration being complained about is one that is specifically documented > as being unsupported and undesirable. > > As commented elsewhere in this thread, currently, there is no > virtualization support for System V IPC in the FreeBSD Jail implementation. > That may change if/when someone implements it. Until it's implemented, it > isn't going to be there, and the system won't behave as though it's there > no matter how much jumping up and down is done. sysvipc has been implemented once, but it has been decided that it adds unnecessary bloat. That's sad. /fjoe
On Tue, 9 May 2006, Max Khon wrote: >> Yes, there seems to be an awful lot of noise being made about the fact that >> the system does, in fact, work exactly as documented, and that the >> configuration being complained about is one that is specifically documented >> as being unsupported and undesirable. >> >> As commented elsewhere in this thread, currently, there is no >> virtualization support for System V IPC in the FreeBSD Jail implementation. >> That may change if/when someone implements it. Until it's implemented, it >> isn't going to be there, and the system won't behave as though it's there >> no matter how much jumping up and down is done. > > sysvipc has been implemented once, but it has been decided that it adds > unnecessary bloat. That's sad. I'm not sure I follow the reasoning behind this statement. Could you direct me to the implementation, and at the specific claim that it adds unnecessary bloat? As far as I know, no implementation of jail support for system v ipc has ever been rejected on the basis that it adds bloat -- all discussion about it has centered on the fact that it is, in fact, a very difficult technical problem to solve, which requires careful consideration of the approach and tradeoffs, and that that careful consideration has not yet bene done. Robert N M Watson