Thread: Quite strange crash
Hi, Does anyone seen this on PostgreSQL 7.0.3????? FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Server process (pid 1008) exited with status 6 at Sun Jan 7 04:29:07 2001 Terminating any active server processes... Server processes were terminated at Sun Jan 7 04:29:07 2001 Reinitializing shared memory and semaphores -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: > Does anyone seen this on PostgreSQL 7.0.3????? > FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? I've been suspicious for awhile that the system might neglect to release buffer cntx_lock spinlocks if an elog() occurs while one is held. This looks like it might be such a case, but you're only showing us the end symptom not what led up to it ... regards, tom lane
On Monday 08 January 2001 00:08, Tom Lane wrote: > Denis Perchine <dyp@perchine.com> writes: > > Does anyone seen this on PostgreSQL 7.0.3????? > > FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > > Were there any errors before that? No... Just clean log (I redirect log from stderr/out t file, and all other to syslog). Here it is just from the begin: ---- DEBUG: Data Base System is starting up at Sun Jan 7 04:22:00 2001 DEBUG: Data Base System was interrupted being in production at Thu Jan 4 23:30:22 2001 DEBUG: Data Base System is in production state at Sun Jan 7 04:22:00 2001 FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Server process (pid 1008) exited with status 6 at Sun Jan 7 04:29:07 2001 Terminating any active server processes... Server processes were terminated at Sun Jan 7 04:29:07 2001 Reinitializing shared memory and semaphores ----- As far as you can see it happends almost just after start. I can give you full list of queries which was made by process 1008. But basically there was only queries like this: select message_id from pop3 where server_id = 6214 insert into pop3 (server_id, mailfrom, mailto, subject, message_id, sent_date, sent_date_text, recieved_date, state) values (25641, 'virtualo.com', 'jrdias@mail.telepac.pt', 'Joao roque Dias I have tried them all....this one is for real........!', '20010107041334.CVEA17335.fep02-svc.mail.telepac.pt@anydomain.com', '2001-01-07 04:06:23 -00', 'Sat, 06 Jan 2001 23:06:23 -0500', 'now', 1) And the last query was: Jan 7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where server_id = 22615 > I've been suspicious for awhile that the system might neglect to release > buffer cntx_lock spinlocks if an elog() occurs while one is held. This > looks like it might be such a case, but you're only showing us the end > symptom not what led up to it ... Just say me what can I do. Unfortunatly I can not reproduce the situation... -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: > On Monday 08 January 2001 00:08, Tom Lane wrote: >>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. >> >> Were there any errors before that? > No... Just clean log (I redirect log from stderr/out t file, and all > other to syslog). The error messages would be in the syslog then, not in stderr. > And the last query was: > Jan 7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where > server_id = 22615 How about the prior queries of other processes? Keep in mind that the spinlock could have been left locked by any backend, not only the one that complained about it. regards, tom lane
> >>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > >> > >> Were there any errors before that? > > > > No... Just clean log (I redirect log from stderr/out t file, and all > > other to syslog). > > The error messages would be in the syslog then, not in stderr. Hmmm... The only strange errors I see are: Jan 7 04:22:14 mx postgres[679]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[631]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[700]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[665]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[633]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[629]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[736]: query: commit Jan 7 04:22:14 mx postgres[736]: ProcessUtility: commit Jan 7 04:22:14 mx postgres[700]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[700]: query: update users set rcpt_ip='213.75.35.129',rcptdate=now() where id=1428067 Jan 7 04:22:14 mx postgres[700]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[679]: query: commit Jan 7 04:22:14 mx postgres[679]: ProcessUtility: commit Jan 7 04:22:14 mx postgres[679]: query: update users set rcpt_ip='213.75.55.185',rcptdate=now() where id=1430836 Jan 7 04:22:14 mx postgres[665]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[665]: query: update users set rcpt_ip='202.156.121.139',rcptdate=now() where id=1271397 Jan 7 04:22:14 mx postgres[665]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[631]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[631]: query: update users set rcpt_ip='24.20.53.63',rcptdate=now() where id=1451254 Jan 7 04:22:14 mx postgres[631]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[633]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[633]: query: update users set rcpt_ip='213.116.168.173',rcptdate=now() where id=1378049 Jan 7 04:22:14 mx postgres[633]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[630]: query: select id,msg,next from alert Jan 7 04:22:14 mx postgres[630]: query: select email,type from email where variant_id=2 Jan 7 04:22:14 mx postgres[630]: query: select * from users where senderdate > now()-'10days'::interval AND variant_id=2AND crypt='21AN6KRffJdFRFc511' Jan 7 04:22:14 mx postgres[629]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[629]: query: update users set rcpt_ip='213.42.45.81',rcptdate=now() where id=1441046 Jan 7 04:22:14 mx postgres[629]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:15 mx postgres[711]: query: select message_id from pop3 where server_id = 17746 Jan 7 04:22:15 mx postgres[711]: ERROR: Relation 'pop3' does not exist They popped up 4 minutes before. And the most interesting is that relation pop3 does exist! > > And the last query was: > > Jan 7 04:27:53 mx postgres[1008]: query: select message_id from pop3 > > where server_id = 22615 > > How about the prior queries of other processes? I do not want to flood maillist (it will be too much of info). I can send you complete log file from Jan 7. It is 128Mb uncompressed. With gz it is 8Mb. Maybe it will be smaller with bz2. > Keep in mind that the > spinlock could have been left locked by any backend, not only the one > that complained about it. Actually you can have a look on the logs yourself. Remember I gave you a password from postgres user. This is the same postgres. Logs are in /var/log/postgres. You will need postgres.log.1.gz. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: >>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. >>>>> >>>>> Were there any errors before that? > Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down PID 2501 had been running: Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy; What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. Roughly one spinlock timeout later, at 04:29:07, we have 1008 complaining of a stuck spinlock. So that fits. The real question is what happened to 2501? None of the other backends reported a SIGTERM signal, so the signal did not come from the postmaster. Another interesting datapoint: there is a second place in this logfile where one single backend reports SIGTERM while its brethren keep running: Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose; ... Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off individual backends after a certain amount of CPU time, are you? What signal does a ulimit violation deliver on your machine, anyway? regards, tom lane
On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote: > Denis Perchine <dyp@perchine.com> writes: > >>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > >>>>> > >>>>> Were there any errors before that? > > > Actually you can have a look on the logs yourself. > > Well, I found a smoking gun: ... > What seems to have happened is that 2501 curled up and died, leaving > one or more buffer spinlocks locked. ... > There is something pretty fishy about this. You aren't by any chance > running the postmaster under a ulimit setting that might cut off > individual backends after a certain amount of CPU time, are you? > What signal does a ulimit violation deliver on your machine, anyway? It's worth noting here that modern Unixes run around killing user-level processes more or less at random when free swap space (and sometimes just RAM) runs low. AIX was the first such, but would send SIGDANGER to processes first to try to reclaim some RAM; critical daemons were expected to explicitly ignore SIGDANGER. Other Unixes picked up the idea without picking up the SIGDANGER behavior. The reason for this common pathological behavior is usually traced to sloppy resource accounting. It manifests as the bad policy of having malloc() (and sbrk() or mmap() underneath) return a valid pointer rather than NULL, on the assumption that most of the memory asked for won't be used just yet. Anyhow, the system doesn't know how much memory is really available at that moment. Usually the problem is explained with the example of a very large process that forks, suddenly demanding twice as much memory. (Apache is particularly egregious this way, allocating lots of memory and then forking several times.) Instead of failing the fork, the kernel waits for a process to touch memory it was granted and then see if any RAM/swap has turned up to satisfy it, and then kill the process (or some random other process!) if not. Now that programs have come to depend on this behavior, it has become very hard to fix it. The implication for the rest of us is that we should expect our processes to be killed at random, just for touching memory granted, or for no reason at all. (Kernel people say, "They're just user-level programs, restart them;" or, "Maybe we can designate some critical processes that don't get killed".) In Linux they try to invent heuristics to avoid killing the X server, because so many programs depend on it. It's a disgraceful mess, really. The relevance to the issue at hand is that processes dying during heavy memory load is a documented feature of our supported platforms. Nathan Myers ncm@zembu.com
> > Well, I found a smoking gun: ... > > What seems to have happened is that 2501 curled up and died, leaving > > one or more buffer spinlocks locked. ... > > There is something pretty fishy about this. You aren't by any chance > > running the postmaster under a ulimit setting that might cut off > > individual backends after a certain amount of CPU time, are you? > > What signal does a ulimit violation deliver on your machine, anyway? > > It's worth noting here that modern Unixes run around killing user-level > processes more or less at random when free swap space (and sometimes > just RAM) runs low. AIX was the first such, but would send SIGDANGER > to processes first to try to reclaim some RAM; critical daemons were > expected to explicitly ignore SIGDANGER. Other Unixes picked up the > idea without picking up the SIGDANGER behavior. That's not the case for sure. There are 512Mb on the machine, and when I had this problem it was compltely unloaded (>300Mb in caches). -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: >> It's worth noting here that modern Unixes run around killing user-level >> processes more or less at random when free swap space (and sometimes >> just RAM) runs low. > That's not the case for sure. There are 512Mb on the machine, and when I had > this problem it was compltely unloaded (>300Mb in caches). The fact that VACUUM processes seemed to be preferential victims suggests a resource limit of some sort. I had suggested a CPU-time limit, but perhaps it could also be disk-pages-written. regards, tom lane
On Monday 08 January 2001 23:21, Tom Lane wrote: > Denis Perchine <dyp@perchine.com> writes: > >>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > >>>>> > >>>>> Were there any errors before that? > > > > Actually you can have a look on the logs yourself. > > Well, I found a smoking gun: > > Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down > > PID 2501 had been running: > > Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy; Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do something for enormous amount of time (I have mailed a sample database to Vadim, but did not get any response yet). It is possible, that it was me, who killed the backend. > What seems to have happened is that 2501 curled up and died, leaving > one or more buffer spinlocks locked. Roughly one spinlock timeout > later, at 04:29:07, we have 1008 complaining of a stuck spinlock. > So that fits. > > The real question is what happened to 2501? None of the other backends > reported a SIGTERM signal, so the signal did not come from the > postmaster. > > Another interesting datapoint: there is a second place in this logfile > where one single backend reports SIGTERM while its brethren keep running: > > Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose; > ... > Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down Hmmm... Maybe this also was me... But I am not sure here. > There is something pretty fishy about this. You aren't by any chance > running the postmaster under a ulimit setting that might cut off > individual backends after a certain amount of CPU time, are you? [postgres@mx postgres]$ ulimit -a core file size (blocks) 1000000 data seg size (kbytes) unlimited file size (blocks) unlimited max memory size (kbytes) unlimited stack size (kbytes) 8192 cpu time (seconds) unlimited max user processes 2048 pipe size (512 bytes) 8 open files 1024 virtual memory (kbytes) 2105343 No, there are no any ulimits. > What signal does a ulimit violation deliver on your machine, anyway? if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) { /* Send SIGXCPU every second.. */ if(!(psecs % HZ)) send_sig(SIGXCPU, p, 1); /* and SIGKILL when we go over max.. */ if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max) send_sig(SIGKILL, p, 1); } This part of the kernel show the logic. This mean that process wil get SIGXCPU each second if it above soft limit, and SIGKILL when it will be above hardlimit. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: > Hmmm... actually this is real problem with vacuum lazy. Sometimes it > just do something for enormous amount of time (I have mailed a sample > database to Vadim, but did not get any response yet). It is possible, > that it was me, who killed the backend. Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the expectation is that as soon as all the backends are dead, the postmaster will reinitialize shared memory. You can get away with sending SIGINT (QueryCancel) to an individual backend. Anything else voids the warranty ;=) But, having said that --- this VACUUM process had only been running for two minutes of real time. Seems unlikely that you'd have chosen to kill it so quickly. regards, tom lane
> Killing an individual backend with SIGTERM is bad luck. The backend > will assume that it's being killed by the postmaster, and will exit > without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in this case? Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Killing an individual backend with SIGTERM is bad luck. The backend >> will assume that it's being killed by the postmaster, and will exit >> without a whole lot of concern for cleaning up shared memory --- the > What code will be returned to postmaster in this case? Right at the moment, the backend will exit with status 0. I think you are thinking the same thing I am: maybe a backend that receives SIGTERM ought to exit with nonzero status. That would mean that killing an individual backend would instantly translate into an installation-wide restart. I am not sure whether that's a good idea. Perhaps this cure is worse than the disease. Comments anyone? regards, tom lane
> >> Killing an individual backend with SIGTERM is bad luck. > >> The backend will assume that it's being killed by the postmaster, > >> and will exit without a whole lot of concern for cleaning up shared > >> memory --- the SIGTERM --> die() --> elog(FATAL) Is it true that elog(FATAL) doesn't clean up shmem etc? This would be very bad... > > What code will be returned to postmaster in this case? > > Right at the moment, the backend will exit with status 0. I think you > are thinking the same thing I am: maybe a backend that > receives SIGTERM ought to exit with nonzero status. > > That would mean that killing an individual backend would instantly > translate into an installation-wide restart. I am not sure whether > that's a good idea. Perhaps this cure is worse than the disease. Well, it's not good idea because of SIGTERM is used for ABORT + EXIT (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >>>>> Killing an individual backend with SIGTERM is bad luck. > SIGTERM --> die() --> elog(FATAL) > Is it true that elog(FATAL) doesn't clean up shmem etc? > This would be very bad... It tries, but I don't think it's possible to make a complete guarantee without an unreasonable amount of overhead. The case at hand was a stuck spinlock because die() --> elog(FATAL) had neglected to release that particular spinlock before exiting. To guarantee that all spinlocks will be released by die(), we'd need something like START_CRIT_SECTION;S_LOCK(spinlock);record that we own spinlock;END_CRIT_SECTION; around every existing S_LOCK() call, and the reverse around every S_UNLOCK. Are you willing to pay that kind of overhead? I'm not sure this'd be enough anyway. Guaranteeing that you have consistent state at every instant that an ISR could interrupt you is not easy. regards, tom lane
* Mikheev, Vadim <vmikheev@SECTORBASE.COM> [010108 23:08] wrote: > > >> Killing an individual backend with SIGTERM is bad luck. > > >> The backend will assume that it's being killed by the postmaster, > > >> and will exit without a whole lot of concern for cleaning up shared > > >> memory --- the > > SIGTERM --> die() --> elog(FATAL) > > Is it true that elog(FATAL) doesn't clean up shmem etc? > This would be very bad... > > > > What code will be returned to postmaster in this case? > > > > Right at the moment, the backend will exit with status 0. I think you > > are thinking the same thing I am: maybe a backend that > > receives SIGTERM ought to exit with nonzero status. > > > > That would mean that killing an individual backend would instantly > > translate into an installation-wide restart. I am not sure whether > > that's a good idea. Perhaps this cure is worse than the disease. > > Well, it's not good idea because of SIGTERM is used for ABORT + EXIT > (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? Er, shouldn't ABORT leave the system in the exact state that it's in so that one can get a crashdump/traceback on a wedged process without it trying to clean up after itself? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> > Well, it's not good idea because of SIGTERM is used for ABORT + EXIT > > (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? > > Er, shouldn't ABORT leave the system in the exact state that it's > in so that one can get a crashdump/traceback on a wedged process > without it trying to clean up after itself? Sorry, I've meant "transaction abort"... Vadim
ncm@zembu.com (Nathan Myers) writes: > The relevance to the issue at hand is that processes dying during > heavy memory load is a documented feature of our supported platforms. Ugh. Do you know anything about *how* they get killed --- ie, with what signal? regards, tom lane
> > Is it true that elog(FATAL) doesn't clean up shmem etc? > > This would be very bad... > > It tries, but I don't think it's possible to make a complete guarantee > without an unreasonable amount of overhead. The case at hand was a > stuck spinlock because die() --> elog(FATAL) had neglected to release > that particular spinlock before exiting. To guarantee that all > spinlocks will be released by die(), we'd need something like > > START_CRIT_SECTION; > S_LOCK(spinlock); > record that we own spinlock; > END_CRIT_SECTION; > > around every existing S_LOCK() call, and the reverse around every > S_UNLOCK. Are you willing to pay that kind of overhead? I'm not START_/END_CRIT_SECTION is mostly CritSectionCount++/--. Recording could be made as LockedSpinLocks[LockedSpinCounter++] = &spinlock in pre-allocated array. Another way of implementing Transaction Abort + Exit could be some flag in shmem setted by postmaster + QueryCancel..? > sure this'd be enough anyway. Guaranteeing that you have consistent > state at every instant that an ISR could interrupt you is not easy. Agreed, but we have to forget old happy days when it was so easy to shutdown DB. If we aren't able to release spins (eg excl buffer lock) on Abort+Exit then instead of fast shutdown by pg_ctl -m fast stop ppl can get checkpointer stuck attempting shlock that buffer. (BTW, it's bad that pg_ctl doesn't wait on shutdown by default). Vadim
> > The relevance to the issue at hand is that processes dying during > > heavy memory load is a documented feature of our supported platforms. > > Ugh. Do you know anything about *how* they get killed --- ie, with > what signal? Didn't you get my mail with a piece of Linux kernel code? I think all is clear there. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > START_/END_CRIT_SECTION is mostly CritSectionCount++/--. > Recording could be made as LockedSpinLocks[LockedSpinCounter++] = &spinlock > in pre-allocated array. Yeah, I suppose. We already do record locking of all the fixed spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that are missing from that (and CRIT_SECTION calls). Would it be reasonable to assume that only one buffer spinlock could be held at a time? > (BTW, it's bad that pg_ctl doesn't wait on shutdown by default). I agree. Anyone object to changing pg_ctl to do -w by default? What should we call the switch to tell it to not wait? -n? regards, tom lane
Denis Perchine <dyp@perchine.com> writes: > Didn't you get my mail with a piece of Linux kernel code? I think all is > clear there. That was implementing CPU-time-exceeded kill, which is a different issue. regards, tom lane
> > Didn't you get my mail with a piece of Linux kernel code? I think all is > > clear there. > > That was implementing CPU-time-exceeded kill, which is a different > issue. Opps.. You are talking about OOM killer. /* This process has hardware access, be more careful. */ if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } You will get SIGKILL in most cases. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
> > START_/END_CRIT_SECTION is mostly CritSectionCount++/--. > > Recording could be made as > > LockedSpinLocks[LockedSpinCounter++] = &spinlock > > in pre-allocated array. > > Yeah, I suppose. We already do record locking of all the fixed > spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that > are missing from that (and CRIT_SECTION calls). Would it be > reasonable to assume that only one buffer spinlock could be held > at a time? No. UPDATE holds two spins, btree split even more. But stop - afair bufmgr remembers locked buffers, probably we could just add XXX_CRIT_SECTION to LockBuffer..? Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Yeah, I suppose. We already do record locking of all the fixed >> spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that >> are missing from that (and CRIT_SECTION calls). Would it be >> reasonable to assume that only one buffer spinlock could be held >> at a time? > No. UPDATE holds two spins, btree split even more. > But stop - afair bufmgr remembers locked buffers, probably > we could just add XXX_CRIT_SECTION to LockBuffer..? Right. A buffer lock isn't a spinlock, ie, we don't hold the spinlock except within LockBuffer. So a quick CRIT_SECTION should deal with that. Actually, with careful placement of CRIT_SECTION calls in LockBuffer, there's no need to record holding the buffer's cntxt spinlock at all, I think. Will work on it. regards, tom lane
Denis Perchine <dyp@perchine.com> writes: > You will get SIGKILL in most cases. Well, a SIGKILL will cause the postmaster to shut down and restart the other backends, so we should be safe if that happens. (Annoyed as heck, maybe, but safe.) Anyway, this is looking more and more like the SIGTERM that caused your vacuum to die must have been done manually. The CRIT_SECTION code that I'm about to go off and add to spinlocking should prevent similar problems from happening in 7.1, but I don't think it's reasonable to try to retrofit that into 7.0.*. regards, tom lane
On Wed, Jan 10, 2001 at 12:46:50AM +0600, Denis Perchine wrote: > > > Didn't you get my mail with a piece of Linux kernel code? I think all is > > > clear there. > > > > That was implementing CPU-time-exceeded kill, which is a different > > issue. > > Opps.. You are talking about OOM killer. > > /* This process has hardware access, be more careful. */ > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > force_sig(SIGTERM, p); > } else { > force_sig(SIGKILL, p); > } > > You will get SIGKILL in most cases. ... on Linux, anyhow. There's no standard for this behavior. Probably others try a SIGTERM first (on several processes) and then a SIGKILL if none die. If a backend dies while holding a lock, doesn't that imply that the shared memory may be in an inconsistent state? Surely a death while holding a lock should shut down the whole database, without writing anything to disk. Nathan Myers ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes: > If a backend dies while holding a lock, doesn't that imply that > the shared memory may be in an inconsistent state? Yup. I had just come to the realization that we'd be best off to treat the *entire* period from SpinAcquire to SpinRelease as a critical section for the purposes of die(). That is, response to SIGTERM will be held off until we have released the spinlock. Most of the places where we grab spinlocks would have to make such a critical section anyway, at least for large parts of the time that they are holding the spinlock, because they are manipulating shared data structures and the instantaneous intermediate states aren't always self-consistent. So we might as well follow the KISS principle and just do START_CRIT_SECTION in SpinAcquire and END_CRIT_SECTION in SpinRelease. Vadim, any objection? regards, tom lane
> Yup. I had just come to the realization that we'd be best > off to treat the *entire* period from SpinAcquire to SpinRelease > as a critical section for the purposes of die(). That is, response > to SIGTERM will be held off until we have released the spinlock. > Most of the places where we grab spinlocks would have to make such > a critical section anyway, at least for large parts of the time that > they are holding the spinlock, because they are manipulating shared > data structures and the instantaneous intermediate states aren't always > self-consistent. So we might as well follow the KISS principle and > just do START_CRIT_SECTION in SpinAcquire and END_CRIT_SECTION in > SpinRelease. > > Vadim, any objection? No one for the moment. If we'll just add XXX_CRIT_SECTION to SpinXXX funcs without changing anything else then it will be easy to remove them later (in the event we'll find any problems with this), so - do it. Vadim