Re: Unhelpful debug tools on OS X :-( - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Unhelpful debug tools on OS X :-( |
Date | |
Msg-id | 4625342F.40907@enterprisedb.com Whole thread Raw |
In response to | Re: Unhelpful debug tools on OS X :-( (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Unhelpful debug tools on OS X :-(
|
List | pgsql-hackers |
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> Any suggestions how to extract some info out of this? > >> Does OS X have the catchsegv tool? > > No, but I suddenly remembered about CrashReporter, and sure enough it's > catching these crashes: > > Exception: EXC_BAD_ACCESS (0x0001) > Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000010 > > Thread 0 Crashed: > 0 postmaster 0x001af4ef smgrextend + 12 (smgr.c:485) > 1 postmaster 0x00029044 end_heap_rewrite + 208 (rewriteheap.c:278) > 2 postmaster 0x000bdc22 cluster_rel + 850 (cluster.c:806) > 3 postmaster 0x000be119 cluster + 160 (cluster.c:220) > 4 postmaster 0x001b74a8 PortalRunUtility + 233 (palloc.h:84) > 5 postmaster 0x001b7784 PortalRunMulti + 237 (pquery.c:1271) > 6 postmaster 0x001b80ae PortalRun + 918 (pquery.c:813) > 7 postmaster 0x001b2afd exec_simple_query + 656 (postgres.c:965) > 8 postmaster 0x001b4b0c PostgresMain + 5628 (postgres.c:3507) > 9 postmaster 0x00183973 ServerLoop + 2828 (postmaster.c:2614) > 10 postmaster 0x00184b1e PostmasterMain + 2794 (postmaster.c:972) > 11 postmaster 0x00130f8e main + 1236 (main.c:188) > 12 postmaster 0x00001e86 _start + 216 > 13 postmaster 0x00001dad start + 41 > > So it looks like this has got something to do with the MVCC-safe cluster > changes, which is not too surprising considering it started happening > around about then. Off to have a look ... I've been looking at the code for a few minutes as well, but haven't found an explanation for that yet. But I did notice that we're not fsyncing the newly written relation like we should. There's a comment raw_heap_insert:/* * Now write the page. We say isTemp = true even if it's not a * temp table,because there's no need for smgr to schedule an * fsync for this write; we'll do it ourselves before committing. */smgrextend(state->rs_new_rel->rd_smgr,state->rs_blockno, (char *) page, true); That's copy-pasted from tablecmds.c. But unlike in tablecmds.c, end_heap_rewrite only fsyncs the new file if we're not WAL-logging. Proposed fix: Index: src/backend/access/heap/rewriteheap.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/rewriteheap.c,v retrieving revision 1.1 diff -c -r1.1 rewriteheap.c *** src/backend/access/heap/rewriteheap.c 8 Apr 2007 01:26:27 -0000 1.1 --- src/backend/access/heap/rewriteheap.c 17 Apr 2007 20:50:05 -0000 *************** *** 272,282 **** } /* ! * If not WAL-logging, must fsync before commit. We use heap_sync ! * to ensure that the toast table gets fsync'd too. */ ! if (!state->rs_use_wal) ! heap_sync(state->rs_new_rel); /* Deleting the context frees everything */ MemoryContextDelete(state->rs_cxt); --- 272,284 ---- } /* ! * Must fsync before commit, even if we've WAL-logged the changes, ! * because we've written pages outside the buffer manager. See comments! * in copy_relation_data in commands/tablecmds.c for more information. ! * ! * We use heap_sync to ensure that the toast table gets fsync'd too. */ ! heap_sync(state->rs_new_rel); /* Deleting the context frees everything */ MemoryContextDelete(state->rs_cxt); BTW: In tablecmds.c the new relation is fsynced with smgrimmedsync, not heap_sync. How about the toast table, it goes through shared buffers as usual, right? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: