Re: pgsql: Generational memory allocator - Mailing list pgsql-committers
From | Tom Lane |
---|---|
Subject | Re: pgsql: Generational memory allocator |
Date | |
Msg-id | 21466.1511644760@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pgsql: Generational memory allocator (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pgsql: Generational memory allocator
|
List | pgsql-committers |
Andres Freund <andres@anarazel.de> writes: > On 2017-11-25 14:50:41 -0500, Tom Lane wrote: >> Meanwhile, skink has come back with a success as of 0f2458f, rendering >> the other mystery even deeper --- although at least this confirms the >> idea that the crashes we are seeing predate the generation.c patch. > Hm, wonder whether that could be optimization dependent too. Evidently :-(. I examined my compiler's assembly output for the relevant line, snapbuild.c:780: ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn, xlrec->target_node,xlrec->target_tid, xlrec->cmin, xlrec->cmax, xlrec->combocid); which is movl 12(%rbx), %eax combocidmovq 16(%rbx), %rcx target_node.spcNode + dbNodemovq %rbp, %rdxmovl 24(%rbx), %r8d target_node.relNodemovq 56(%r12), %rdimovq 28(%rbx), %r9 target_tid ... 8 bytes worthmovl (%rbx), %esi top_xidmovl %eax, 16(%rsp)movl 8(%rbx), %eax cmaxmovl %eax, 8(%rsp)movl 4(%rbx), %eax cminmovl %eax, (%rsp)call ReorderBufferAddNewTupleCids I've annotated the fetches from *rbx to match the data structure: typedef struct xl_heap_new_cid { /* * store toplevel xid so we don't have to merge cids from different * transactions */ TransactionId top_xid; CommandId cmin; CommandId cmax; /* * don't really need the combocid since we have the actual values right in * this struct, but the padding makesit free and its useful for * debugging. */ CommandId combocid; /* * Store the relfilenode/ctid pair to facilitate lookups. */ RelFileNode target_node; ItemPointerData target_tid; } xl_heap_new_cid; and what's apparent is that it's grabbing 8 bytes not just 6 from target_tid. So the failure case must occur when the xl_heap_new_cid WAL record is right at the end of the palloc'd record buffer and valgrind notices that we're fetching padding bytes. I suspect that gcc feels within its rights to do this because the size/alignment of xl_heap_new_cid is a multiple of 4 bytes, so that there ought to be 2 padding bytes at the end. We could possibly prevent that by marking the whole xl_heap_new_cid struct as packed/align(2), the same way ItemPointerData is marked, but that would render accesses to all of its fields inefficient on alignment-sensitive machines. Moreover, if we go that route, we have a boatload of other xlog record formats with similar hazards. Instead I propose that we should make sure that the palloc request size for XLogReaderState->main_data is always maxalign'd. The existing behavior in DecodeXLogRecord of palloc'ing it only just barely big enough for the current record seems pretty brain-dead performance-wise even without this consideration. Generally, if we need to enlarge that buffer, we should enlarge it significantly, IMO. BTW, that comment in the middle of xl_heap_new_cid about "padding makes this field free" seems entirely wrong. By my count the struct is currently 34 bytes, so if by padding it's talking about maxalign alignment of the whole struct, that field is costing us 8 bytes on maxalign-8 machines. There's certainly no internal alignment that would make it free. regards, tom lane
pgsql-committers by date: