Re: Database corruption in 7.0.3 - Mailing list pgsql-hackers
From | Denis Perchine |
---|---|
Subject | Re: Database corruption in 7.0.3 |
Date | |
Msg-id | 200103150952.PAA27478@technoart.net Whole thread Raw |
In response to | Database corruption in 7.0.3 (Tim Allen <tim@proximity.com.au>) |
List | pgsql-hackers |
Can confirm this. Get this just yesterday time ago... Messages: NOTICE: Rel acm: TID 1697/217: OID IS INVALID. TUPGONE 1. And lots of such lines... And pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. In the end :-((( I lost a library of our institute... :-((( But I have a backup!!! :-)))) This table even have NO indices!!! Program received signal SIGSEGV, Segmentation fault. 0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311 311 alignedSize = MAXALIGN((*lp).lp_len); (gdb) bt #0 0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311 #1 0x80a9b07 in vc_scanheap (vacrelstats=0x82675b0, onerel=0x8273428, vacuum_pages=0xbfffe928, fraged_pages=0xbfffe918) at vacuum.c:1022 #2 0x80a8e8b in vc_vacone (relid=27296, analyze=0 '\000', va_cols=0x0) at vacuum.c:599 #3 0x80a8217 in vc_vacuum (VacRelP=0xbfffe9b4, analyze=0 '\000', va_cols=0x0) at vacuum.c:299 #4 0x80a818b in vacuum (vacrel=0x8267400 "", verbose=1 '\001', analyze=0 '\000', va_spec=0x0) at vacuum.c:223 #5 0x813fba5 in ProcessUtility (parsetree=0x8267418, dest=Remote) at utility.c:694 #6 0x813c16e in pg_exec_query_dest (query_string=0x820aaa0 "vacuum verbose acm;", dest=Remote, aclOverride=0 '\000') at postgres.c:617 #7 0x813c08e in pg_exec_query (query_string=0x820aaa0 "vacuum verbose acm;") at postgres.c:562 #8 0x813d4c3 in PostgresMain (argc=9, argv=0xbffff068, real_argc=9, real_argv=0xbffffa3c) at postgres.c:1588 #9 0x811ace5 in DoBackend (port=0x8223068) at postmaster.c:2009 #10 0x811a639 in BackendStartup (port=0x8223068) at postmaster.c:1776 #11 0x811932f in ServerLoop () at postmaster.c:1037 #12 0x8118b0e in PostmasterMain (argc=9, argv=0xbffffa3c) at postmaster.c:725 #13 0x80d5e5e in main (argc=9, argv=0xbffffa3c) at main.c:93 #14 0x40111fee in __libc_start_main () from /lib/libc.so.6 This is plain 7.0.3. On Thursday 15 March 2001 14:52, Tim Allen wrote: > We have an application that we were running quite happily using pg6.5.3 > in various customer sites. Now we are about to roll out a new version of > our application, and we are going to use pg7.0.3. However, in testing > we've come across a couple of isolated incidents of database > corruption. They are sufficiently rare that I can't reproduce the problem, > nor can I put my finger on just what application behaviour causes the > problems. > > The symptoms most often involve some sort of index corruption, which is > reported by vacuum and it seems that vacuum can fix it. On occasion vacuum > reports "invalid OID" or similar (sorry, don't have exact wording of > message). On one occasion the database has been corrupted to the point of > unusability (ie vacuum admitted that it couldn't fix the problem), and a > dump/restore was required (thankfully that at least worked). The index > corruption also occasionally manifests itself in the form of spurious > uniqueness constraint violation errors. > > The previous version of our app using 6.5.3 has never shown the slightest > symptom of database misbehaviour, to the best of my knowledge, despite > fairly extensive use. So our expectations are fairly high :-). > > One thing that is different about the new version of our app is that we > now use multiple connections to the database (previously we only had > one). We can in practice have transactions in progress on several > connections at once, and it is possible for some transactions to be rolled > back under application control (ie explicit ROLLBACK; statement). > > I realise I haven't really provided an awful lot of information that would > help identify the problem, so I shall attempt to be understanding if > no-one can offer any useful suggestions. But I hope someone can :-). Has > anyone seen this sort of problem before? Are there any known > database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in > the mailing lists. Is using multiple connections likely to stimulate any > known areas of risk? > > BTW we are using plain vanilla SQL, no triggers, no new types defined, no > functions, no referential integrity checks, nothing more ambitious than a > multi-column primary key. > > The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our > testing boxes and on my development box we have never seen this, but we > have seen it several times on our other test box and at least one customer > site, so there is some possibility it's related to dodgy hardware. The > customer box with the problem is a multi-processor box, all the other > boxes we've tested on are single-processor. > > TIA for any help, > > Tim -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
pgsql-hackers by date: