Thread: RE: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )
> In any case, when one backend quits and another one is > started, the new > one will re-use the semaphore no longer used by the defunct backend. I have tested my solution a bit more and I have to say that reusing a semaphore by a new backend works OK. But it is not possible for a newly created backend to use a semaphore allocated by postmaster (it freezes on test if the semaphore with given key already exists - done with semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution uses the ipc library in the right way. There are no longer any error messages from the ipc library when running the server. And I can't say that the ipc library is a 100% correct implementation of SysV IPC, it is probably (sure ;-) )caused by the Windows internals. Dan
Horak Daniel <horak@mmp.plzen-city.cz> writes: > I have tested my solution a bit more and I have to say that reusing a > semaphore by a new backend works OK. But it is not possible for a newly > created backend to use a semaphore allocated by postmaster (it freezes on > test if the semaphore with given key already exists - done with > semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in > storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution > uses the ipc library in the right way. It seems that you have found a bug in the cygipc library. I suggest reporting it to the author of same... regards, tom lane
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > In any case, when one backend quits and another one is > > started, the new > > one will re-use the semaphore no longer used by the defunct backend. > > I have tested my solution a bit more and I have to say that reusing a > semaphore by a new backend works OK. But it is not possible for a newly > created backend to use a semaphore allocated by postmaster (it freezes on > test if the semaphore with given key already exists - done with > semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in > storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution > uses the ipc library in the right way. There are no longer any error > messages from the ipc library when running the server. And I can't say that > the ipc library is a 100% correct implementation of SysV IPC, it is probably > (sure ;-) )caused by the Windows internals. Seems we may have to use the patch, or make some other patch for NT-only that works around this NT bug. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Horak Daniel <horak@mmp.plzen-city.cz> writes: > > I have tested my solution a bit more and I have to say that reusing a > > semaphore by a new backend works OK. But it is not possible for a newly > > created backend to use a semaphore allocated by postmaster (it freezes on > > test if the semaphore with given key already exists - done with > > semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in > > storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution > > uses the ipc library in the right way. > > It seems that you have found a bug in the cygipc library. I suggest > reporting it to the author of same... Yes, but can we expect all NT sites to get the patch before using PostgreSQL? Is there a workaround we can implement? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution >> uses the ipc library in the right way. There are no longer any error >> messages from the ipc library when running the server. And I can't say that >> the ipc library is a 100% correct implementation of SysV IPC, it is probably >> (sure ;-) )caused by the Windows internals. > Seems we may have to use the patch, or make some other patch for NT-only > that works around this NT bug. I don't have a problem with installing an NT patch (lord knows there are plenty of #ifdef __CYGWIN32__'s in the code already). But I have a problem with *this* patch because I don't believe we understand what it is doing, and therefore I have no confidence in it. The extent of our understanding so far is that one backend can create a semaphore that can be used by a later backend, but the postmaster cannot create a semaphore that can be used by a later backend. I don't really believe that; I think there is something else going on. Until we understand what the something else is, I don't think we have a trustworthy solution. The real reason I feel itchy about this is that I know that interprocess synchronization is a very tricky area, so I'm not confident that the limited amount of testing Dan can do by himself proves that things are solid. As the old saw goes, "testing cannot prove the absence of bugs". I want to have both clean test results *and* an understanding of what we are doing before I will feel comfortable. Looking again at the code, it occurs to me that a backend exiting normally will probably leave its semaphore set nonzero, which could (given a buggy IPC library) have something to do with whether another process can attach to the sema or not. The postmaster code is *trying* to create the semas with nonzero starting values, but I see that the backend code takes the additional step of doing semun.val = IpcSemaphoreDefaultStartValue; semctl(semId, semNum, SETVAL,semun); whereas the postmaster code doesn't. Maybe the create call isn't initializing the semaphores the way it's told to? It'd be worth trying adding a step like this to the postmaster preallocation. In any case, I'd really like us to get some feedback from the author of cygipc about this issue. I don't mind working around a bug once we understand exactly what the bug is --- but in this particular area, I think guessing our way to a workaround isn't good enough. regards, tom lane
Got it. > Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution > >> uses the ipc library in the right way. There are no longer any error > >> messages from the ipc library when running the server. And I can't say that > >> the ipc library is a 100% correct implementation of SysV IPC, it is probably > >> (sure ;-) )caused by the Windows internals. > > > Seems we may have to use the patch, or make some other patch for NT-only > > that works around this NT bug. > > I don't have a problem with installing an NT patch (lord knows there > are plenty of #ifdef __CYGWIN32__'s in the code already). But I have > a problem with *this* patch because I don't believe we understand what > it is doing, and therefore I have no confidence in it. The extent of > our understanding so far is that one backend can create a semaphore that > can be used by a later backend, but the postmaster cannot create a > semaphore that can be used by a later backend. I don't really believe > that; I think there is something else going on. Until we understand > what the something else is, I don't think we have a trustworthy > solution. > > The real reason I feel itchy about this is that I know that interprocess > synchronization is a very tricky area, so I'm not confident that the > limited amount of testing Dan can do by himself proves that things are > solid. As the old saw goes, "testing cannot prove the absence of bugs". > I want to have both clean test results *and* an understanding of what > we are doing before I will feel comfortable. > > Looking again at the code, it occurs to me that a backend exiting > normally will probably leave its semaphore set nonzero, which could > (given a buggy IPC library) have something to do with whether another > process can attach to the sema or not. The postmaster code is *trying* > to create the semas with nonzero starting values, but I see that the > backend code takes the additional step of doing > semun.val = IpcSemaphoreDefaultStartValue; > semctl(semId, semNum, SETVAL, semun); > whereas the postmaster code doesn't. Maybe the create call isn't > initializing the semaphores the way it's told to? It'd be worth > trying adding a step like this to the postmaster preallocation. > > In any case, I'd really like us to get some feedback from the author of > cygipc about this issue. I don't mind working around a bug once we > understand exactly what the bug is --- but in this particular area, > I think guessing our way to a workaround isn't good enough. > > regards, tom lane > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Horak Daniel > Sent: Tuesday, August 17, 1999 9:06 PM > To: 'Tom Lane' > Cc: 'pgsql-hackers@postgreSQL.org' > Subject: RE: [HACKERS] backend freezeing on win32 fixed (I hope ;-) ) > > > > In any case, when one backend quits and another one is > > started, the new > > one will re-use the semaphore no longer used by the defunct backend. > > I have tested my solution a bit more and I have to say that reusing a > semaphore by a new backend works OK. But it is not possible for a newly > created backend to use a semaphore allocated by postmaster (it freezes on > test if the semaphore with given key already exists - done with > semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in > storage/ipc/ipc.c ). Why it is, I don't know, but it seems that > my solution > uses the ipc library in the right way. There are no longer any error > messages from the ipc library when running the server. And I > can't say that > the ipc library is a 100% correct implementation of SysV IPC, it > is probably > (sure ;-) )caused by the Windows internals. > Yutaka Tanida [yutaka@marin.or.jp] and I have examined IPC library. We found that postmaster doesn't call exec() after fork() since v6.4. The value of static/extern variables which cygipc library holds may be different from their initial values when postmaster fork()s child backend processes. I made the following patch for cygipc library on trial. This patch was effective for Yutaka's test case. Regards. Hiroshi Inoue Inoue@tpf.co.jp *** sem.c.orig Tue Dec 01 00:16:25 1998 --- sem.c Tue Aug 17 13:22:06 1999 *************** *** 58,63 **** --- 58,78 ---- static int GFirstSem = 0; /*PCPC*/ static int GFdSem ; /*PCPC*/ + static pid_t GProcessId = 0; + + static void init_globals(void) + { + pid_t pid; + + if (pid=getpid(), pid != GProcessId) + { + GFirstSem = 0; + used_sems = used_semids = max_semid = 0; + sem_seq = 0; + GProcessId = pid; + } + } + /************************************************************************/ /* Demande d'acces a la zone partagee de gestiondes semaphores */ /************************************************************************/ *************** *** 77,82 **** --- 92,98 ---- { int LRet ; + init_globals(); if( GFirstSem == 0 ) { if( IsGSemSemExist() ) *** shm.c.orig Tue Dec 01 01:04:57 1998 --- shm.c Tue Aug 17 13:22:27 1999 *************** *** 59,64 **** --- 59,81 ---- static int GFirstShm = 0; /*PCPC*/ static int GFdShm ; /*PCPC*/ + /*****************************************/ + /* Initialization of static variables */ + /*****************************************/ + static pid_t GProcessId = 0; + static void init_globals(void) + { + pid_t pid; + + if (pid=getpid(), pid != GProcessId) + { + GFirstShm = 0; + shm_rss = shm_swp = max_shmid = 0; + shm_seq = 0; + GProcessId = pid; + } + } + /************************************************************************/ /* Demande d'acces a la zone partagee de gestiondes shm */ /************************************************************************/ *************** *** 82,87 **** --- 99,105 ---- { int LRet ; + init_globals(); if( GFirstShm == 0 ) { if( IsGSemShmExist() ) *** msg.c.orig Tue Dec 01 00:16:09 1998 --- msg.c Tue Aug 17 13:20:04 1999 *************** *** 57,62 **** --- 57,77 ---- static int GFirstMsg = 0; /*PCPC*/ static int GFdMsg ; /*PCPC*/ + /*****************************************/ + /* Initialization of static variables */ + /*****************************************/ + static pid_t GProcessId = 0; + static void init_globals(void) + { + pid_t pid; + + if (pid=getpid(), pid != GProcessId) + { + GFirstMsg = 0; + msgbytes = msghdrs = msg_seq = used_queues = max_msqid = 0; + GProcessId = pid; + } + } /************************************************************************/ /* Demande d'acces a la zone partagee de gestiondes semaphores */ /************************************************************************/ *************** *** 79,84 **** --- 94,100 ---- { int LRet ; + init_globals(); if( GFirstMsg == 0 ) { if( IsGSemMsgExist() )
I have added this to the end of the README.NT file. [Charset iso-8859-1 unsupported, filtering to ASCII...] > > -----Original Message----- > > From: owner-pgsql-hackers@postgreSQL.org > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Horak Daniel > > Sent: Tuesday, August 17, 1999 9:06 PM > > To: 'Tom Lane' > > Cc: 'pgsql-hackers@postgreSQL.org' > > Subject: RE: [HACKERS] backend freezeing on win32 fixed (I hope ;-) ) > > > > > > > In any case, when one backend quits and another one is > > > started, the new > > > one will re-use the semaphore no longer used by the defunct backend. > > > > I have tested my solution a bit more and I have to say that reusing a > > semaphore by a new backend works OK. But it is not possible for a newly > > created backend to use a semaphore allocated by postmaster (it freezes on > > test if the semaphore with given key already exists - done with > > semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in > > storage/ipc/ipc.c ). Why it is, I don't know, but it seems that > > my solution > > uses the ipc library in the right way. There are no longer any error > > messages from the ipc library when running the server. And I > > can't say that > > the ipc library is a 100% correct implementation of SysV IPC, it > > is probably > > (sure ;-) )caused by the Windows internals. > > > > Yutaka Tanida [yutaka@marin.or.jp] and I have examined IPC > library. > > We found that postmaster doesn't call exec() after fork() since v6.4. > > The value of static/extern variables which cygipc library holds may > be different from their initial values when postmaster fork()s child > backend processes. > > I made the following patch for cygipc library on trial. > This patch was effective for Yutaka's test case. > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp > > *** sem.c.orig Tue Dec 01 00:16:25 1998 > --- sem.c Tue Aug 17 13:22:06 1999 > *************** > *** 58,63 **** > --- 58,78 ---- > static int GFirstSem = 0; /*PCPC*/ > static int GFdSem ; /*PCPC*/ > > + static pid_t GProcessId = 0; > + > + static void init_globals(void) > + { > + pid_t pid; > + > + if (pid=getpid(), pid != GProcessId) > + { > + GFirstSem = 0; > + used_sems = used_semids = max_semid = 0; > + sem_seq = 0; > + GProcessId = pid; > + } > + } > + > /************************************************************************/ > /* Demande d'acces a la zone partagee de gestion des semaphores */ > /************************************************************************/ > *************** > *** 77,82 **** > --- 92,98 ---- > { > int LRet ; > > + init_globals(); > if( GFirstSem == 0 ) > { > if( IsGSemSemExist() ) > *** shm.c.orig Tue Dec 01 01:04:57 1998 > --- shm.c Tue Aug 17 13:22:27 1999 > *************** > *** 59,64 **** > --- 59,81 ---- > static int GFirstShm = 0; /*PCPC*/ > static int GFdShm ; /*PCPC*/ > > + /*****************************************/ > + /* Initialization of static variables */ > + /*****************************************/ > + static pid_t GProcessId = 0; > + static void init_globals(void) > + { > + pid_t pid; > + > + if (pid=getpid(), pid != GProcessId) > + { > + GFirstShm = 0; > + shm_rss = shm_swp = max_shmid = 0; > + shm_seq = 0; > + GProcessId = pid; > + } > + } > + > /************************************************************************/ > /* Demande d'acces a la zone partagee de gestion des shm */ > /************************************************************************/ > *************** > *** 82,87 **** > --- 99,105 ---- > { > int LRet ; > > + init_globals(); > if( GFirstShm == 0 ) > { > if( IsGSemShmExist() ) > *** msg.c.orig Tue Dec 01 00:16:09 1998 > --- msg.c Tue Aug 17 13:20:04 1999 > *************** > *** 57,62 **** > --- 57,77 ---- > static int GFirstMsg = 0; /*PCPC*/ > static int GFdMsg ; /*PCPC*/ > > + /*****************************************/ > + /* Initialization of static variables */ > + /*****************************************/ > + static pid_t GProcessId = 0; > + static void init_globals(void) > + { > + pid_t pid; > + > + if (pid=getpid(), pid != GProcessId) > + { > + GFirstMsg = 0; > + msgbytes = msghdrs = msg_seq = used_queues = max_msqid = 0; > + GProcessId = pid; > + } > + } > /************************************************************************/ > /* Demande d'acces a la zone partagee de gestion des semaphores */ > /************************************************************************/ > *************** > *** 79,84 **** > --- 94,100 ---- > { > int LRet ; > > + init_globals(); > if( GFirstMsg == 0 ) > { > if( IsGSemMsgExist() ) > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026