Re: Serious Crash last Friday - Mailing list pgsql-general
From | Scott Marlowe |
---|---|
Subject | Re: Serious Crash last Friday |
Date | |
Msg-id | Pine.LNX.4.33.0206201326120.8468-100000@css120.ihs.com Whole thread Raw |
In response to | Re: Serious Crash last Friday ("Henrik Steffen" <steffen@city-map.de>) |
Responses |
Re: Serious Crash last Friday
|
List | pgsql-general |
On Thu, 20 Jun 2002, Henrik Steffen wrote: > Additionally yesterday night there was again a problem with some SELECTs: > > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > DB-Error in /web/intern.city-map.de/www/vertrieb/wiedervorlage.pl Code 7: > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. Look at that error message again. It says SOME OTHER backend died abnormally, and your query was terminated because of it. I.e. the following query was NOT the problem it simply couldn't run because some other query caused a backend to abort. > > Command was: > SELECT name > FROM regionen > WHERE region='0119'; > at /web/pm/CityMap/Abfragen.pm line 135 > > This is really annoying. Yes it is. > When I noticed it this morning, I dropped all indexes and recreated them. > Then I ran a VACUUM FULL VERBOSE ANALYZE - afterwards the same query worked > properly again. It would likely have worked fine without all that, since it wasn't the cause of the backend crash. > I have now created a cronjob that will drop and recreate all indexes on a > daily basis. > > But shouldn't this be unnecessary ? Correct. Someday, someone will step up to the plate and fix the problem with btrees growing and growing and not reusing dead space. Til then the solution is to reindex heavily updated indexes during nightly maintenance. A few questions. Have you done any really heavy testing on your server to make sure it has no problems with its hardware or anything? I've seen machines with memory errors or bad blocks on the hard drive slip into production and wreak havoc due to slow corruption of a database. Try compiling the linux kernel with a -j 10 switch (i.e. 10 seperate threads, eats up tons of memory) and see if you get sig 11 errors. Also, check your hard drives for bad blocks as well (badblock is the command, and it can run in a "save a block before write testing it then put the data back in it" mode that lets you find all the bad blocks on your hard drives. Bad blocks are the primary reason I always try to run my database on RAID1 or RAID5 software raid as a minimum on Linux, since a bad block will cause the affected drive to be marked offline, and not affect your data integrity. -- "Force has no place where there is need of skill.", "Haste in every business brings failures.", "This is the bitterest pain among men, to have much knowledge but no power." -- Herodotus
pgsql-general by date: