Still crashing with latest 7.0.2 (Re: (forw) more crashes) - Mailing list pgsql-hackers
From | Alfred Perlstein |
---|---|
Subject | Still crashing with latest 7.0.2 (Re: (forw) more crashes) |
Date | |
Msg-id | 20001008034834.C272@fw.wintelcom.net Whole thread Raw |
In response to | Re: (forw) more crashes (Alfred Perlstein <bright@wintelcom.net>) |
Responses |
Re: Still crashing with latest 7.0.2 (Re: (forw) more crashes)
|
List | pgsql-hackers |
* Alfred Perlstein <bright@wintelcom.net> [001006 16:02] wrote: > * Tom Lane <tgl@sss.pgh.pa.us> [001004 09:56] wrote: > > Alfred Perlstein <bright@wintelcom.net> writes: > > > I have a reliable way to make postgresql crash after a > > > couple of hours over here and a backtrace that looks like a good > > > catch. > > > > I'm interested in pursuing this, but the backtrace doesn't give enough > > info to debug it. It looks like the backend is crashing because of > > a previously-corrupted tuple, so what we'll need to do is work backwards > > to find where the data corruption is occurring. > > > > Can you boil down the test sequence to something that could be > > reproduced by other people? The most convenient way to work on it > > would be to see it happen here... > > I just wanted to note on the list that these crashes seem to have > stopped with the latest 7.0.2-patches (as of 11:30ish PM EST Oct, > 4th), it's been over 24 hours since the upgrade (previously I > couldn't go for more than 20 without a crash). > > My only concern is that I didn't notice anything on the cvs list > that referenced a fix for crashes. > > Well anyhow I'll post an update in a couple of days if all is well > or not. Unfortunatly I'm still getting crashes, this one looks like it's during a vacuum, previously I got a crash while doing an UPDATE, but in exactly the same spot, it took quite a bit longer to provoke this time: -rw------- 1 pgsql pgsql 277561344 Oct 8 02:56 postgres.core #0 0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3, tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537 537 off = att_addlength(off, att[i]->attlen, tp + off); (gdb) bt #0 0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3, tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537 #1 0x8075851 in GetIndexValue (tuple=0xbfbfe974, hTupDesc=0x84ca368, attOff=3, attrNums=0x8508240, fInfo=0x0, attNull=0xbfbfe7fb"") at indexam.c:445 #2 0x80903be in FormIndexDatum (numberOfAttributes=4, attributeNumber=0x8508240, heapTuple=0xbfbfe974, heapDescriptor=0x84ca368, datum=0x8508018, nullv=0x84ba170 " ", fInfo=0x0) at index.c:1256 #3 0x80a05e6 in vc_repair_frag (vacrelstats=0x84ba290, onerel=0x84c6788, vacuum_pages=0xbfbfea1c, fraged_pages=0xbfbfea0c,nindices=1, Irel=0x84ba118) at vacuum.c:1634 #4 0x809e3b9 in vc_vacone (relid=1315147913, analyze=0, va_cols=0x0) at vacuum.c:640 #5 0x809d9ac in vc_vacuum (VacRelP=0xbfbfeaac, analyze=0 '\000', va_cols=0x0) at vacuum.c:299 #6 0x809d934 in vacuum (vacrel=0x84ba0e8 "\030", verbose=1, analyze=0 '\000', va_spec=0x0) at vacuum.c:223 #7 0x810ca8c in ProcessUtility (parsetree=0x84ba110, dest=Remote) at utility.c:694 #8 0x810a44e in pg_exec_query_dest ( query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;", dest=Remote,aclOverride=0) at postgres.c:617 #9 0x810a3a9 in pg_exec_query ( query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;") at postgres.c:562 #10 0x810b336 in PostgresMain (argc=7, argv=0xbfbff12c, real_argc=10, real_argv=0xbfbffb8c) at postgres.c:1588 #11 0x80f0742 in DoBackend (port=0x8464000) at postmaster.c:2009 #12 0x80f02d5 in BackendStartup (port=0x8464000) at postmaster.c:1776 #13 0x80ef4f9 in ServerLoop () at postmaster.c:1037 #14 0x80eeede in PostmasterMain (argc=10, argv=0xbfbffb8c) at postmaster.c:725 #15 0x80bf3eb in main (argc=10, argv=0xbfbffb8c) at main.c:93 #16 0x8063495 in _start () st 532 533 if (usecache) 534 att[i]->attcacheoff = off; 535 } 536 537 off = att_addlength(off, att[i]->attlen, tp + off); 538 539 if (usecache && 540 att[i]->attlen == -1 && !VARLENA_FIXED_SIZE(att[i])) 541 usecache = false; it looks like it's dieing in the same place as the previous coredumps however this looks like it's during a vacuum rather than an update: (gdb) print off $1 = -838833616 (gdb) print att[i] $2 = 0x84ca640 (gdb) print *(att[i]) $3 = {attrelid = 1315147913, attname = { data = "attr_name", '\000' <repeats 22 times>, alignmentDummy = 1920234593},atttypid = 1043, attdisbursion = 0, attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', attalign = 105 'i', attnotnull = 0 '\000', atthasdef= 0 '\000'} (gdb) print i $4 = 2 (gdb) print tp $5 = 0x5808eba5 "Yj" (gdb) print tp+off $6 = 0x260955d5 <Address 0x260955d5 out of bounds> ack! (gdb) print usecache $7 = 0 '\000' (gdb) print attnum $8 = 3 (gdb) print slow $9 = 139159376 (gdb) print *slow $10 = 139241024 (gdb) print (char *) tup + tup->t_hoff $11 = 0x5808eba5 "Yj" (gdb) print tup $12 = 0x5808eba0 (gdb) print *tup $13 = {t_oid = 0, t_cmin = 6969654, t_cmax = 6958161, t_xmin = 1742, t_xmax = 6955895, t_ctid = {ip_blkid = {bi_hi = 0,bi_lo = 639}, ip_posid = 84}, t_natts = 737, t_infomask = 32846, t_hoff = 5 '\005', t_bits = "\000\002¥ "} (gdb) print *tupleDesc $14 = {natts = 1358981721, attrs = 0xce006a2c, constr = 0x77000006} (gdb) print *(att[0]) $15 = {attrelid = 1315147913, attname = { data = "counter_id", '\000' <repeats 21 times>, alignmentDummy = 1853189987},atttypid = 23, attdisbursion = 0, attlen = 4, attnum = 1, attnelems = 0, attcacheoff = 0, atttypmod = -1, attbyval= 1 '\001', attstorage = 112 'p', attisset = 0 '\000', attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0'\000'} (gdb) print *(att[1]) $16 = {attrelid = 1315147913, attname = { data = "attr_type", '\000' <repeats 22 times>, alignmentDummy = 1920234593},atttypid = 1043, attdisbursion = 0, attlen = -1, attnum = 2, attnelems = 0, attcacheoff = 4, atttypmod = 36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', attalign = 105 'i', attnotnull = 0 '\000', atthasdef= 0 '\000'} (gdb) print *(att[2]) $17 = {attrelid = 1315147913, attname = { data = "attr_name", '\000' <repeats 22 times>, alignmentDummy = 1920234593},atttypid = 1043, attdisbursion = 0, attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', attalign = 105 'i', attnotnull = 0 '\000', atthasdef= 0 '\000'} (gdb) print *(att[3]) $18 = {attrelid = 1315147913, attname = { data = "attr_vers", '\000' <repeats 22 times>, alignmentDummy = 1920234593},atttypid = 1043, attdisbursion = 0, attlen = -1, attnum = 4, attnelems = 0, attcacheoff = -1, atttypmod = 36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', attalign = 105 'i', attnotnull = 0 '\000', atthasdef= 0 '\000'} (gdb) print *(att[4]) $19 = {attrelid = 1315147913, attname = { data = "attr_hits", '\000' <repeats 22 times>, alignmentDummy = 1920234593},atttypid = 20, attdisbursion = 0, attlen = 8, attnum = 5, attnelems = 0, attcacheoff = -1, atttypmod = -1, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', attalign = 100 'd', attnotnull = 0 '\000', atthasdef = 1'\001'} (gdb) print *tuple $20 = {t_len = 80, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 640}, ip_posid = 5}, t_datamcxt = 0x0, t_data = 0x5808eba0} thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
pgsql-hackers by date: