Re: WIP: SP-GiST, Space-Partitioned GiST - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: WIP: SP-GiST, Space-Partitioned GiST |
Date | |
Msg-id | 14742.1323203111@sss.pgh.pa.us Whole thread Raw |
In response to | Re: WIP: SP-GiST, Space-Partitioned GiST (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: WIP: SP-GiST, Space-Partitioned GiST
|
List | pgsql-hackers |
Oleg Bartunov <oleg@sai.msu.su> writes: > There is one annoying problem under MAC OS (Linux, FreeBSD have no problem), we > just can't figure out how to find it, since we are not familiar with MAC OS - > it fails to restart after 'kill -9' backend, but only if sources were > compiled with -O2 option (no problem occured with -O0). Since the fail happens > not every time, we use following script to reproduce the problem. We ask > MAC OS guru to help us debugging this problem. I don't think it's Mac-specific at all; it looks to me like garden variety uninitialized data, specifically that there are paths through doPickSplit that don't set xlrec.newPage. The crash I'm seeing is TRAP: FailedAssertion("!(offset <= (((PageHeader) (page))->pd_lower <= (__builtin_offsetof (PageHeaderData, pd_linp)) ? 0: ((((PageHeader) (page))->pd_lower - (__builtin_offsetof (PageHeaderData, pd_linp))) / sizeof(ItemIdData))) + 1)", File:"spgxlog.c", Line: 81) #0 0x00007fff883f982a in __kill () #1 0x00007fff85bdda9c in abort () #2 0x0000000103165a71 in ExceptionalCondition (conditionName=<value temporarily unavailable, due to optimizations>, errorType=<valuetemporarily unavailable, due to optimizations>, fileName=<value temporarily unavailable, due to optimizations>,lineNumber=<value temporarily unavailable, due to optimizations>) at assert.c:57 #3 0x0000000102eeec73 in addOrReplaceTuple (page=0x74cc <Address 0x74cc out of bounds>, tuple=0x7faa1182d64c " ", size=88,offset=70) at spgxlog.c:81 #4 0x0000000102eed4bc in spgRedoPickSplit [inlined] () at /Users/tgl/pgsql/src/backend/access/spgist/spgxlog.c:504 #5 0x0000000102eed4bc in spg_redo (record=0x7fff62a5ccf0) at spgxlog.c:803 #6 0x0000000102ec4f48 in StartupXLOG () at xlog.c:6534 #7 0x0000000103054378 in StartupProcessMain () at startup.c:220 #8 0x0000000102ef4449 in AuxiliaryProcessMain (argc=2, argv=0x7fff62a60030) at bootstrap.c:414 The xlog record it's working on is (gdb) p *(spgxlogPickSplit*)(0x7fcb20826600 + 32) $6 = { node = { spcNode = 1663, dbNode = 41578, relNode = 204800 }, nTuples = 75, nNodes = 4, blknoSrc = 988, nDelete = 74, blknoInner = 929, offnumInner = 70, newPage = 1 '\001', blknoParent = 929, offnumParent = 13, nodeI= 2, stateSrc = { attType_attlen = 16, fakeTupleSize = 32, isBuild = 1 } } Since newPage is set, addOrReplaceTuple gets called on a freshly initialized page, and not surprisingly complains that offset 70 is way out of range. Maybe there's something wrong with the replay logic, but what I'm thinking is that newPage should not have been true here, which means that doPickSplit failed to set it correctly, which doesn't look at all improbable. I added a memset at the top of doPickSplit to force the whole struct to zeroes, and so far haven't seen the crash again. regards, tom lane
pgsql-hackers by date: