Re: PATCH: two slab-like memory allocators - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: PATCH: two slab-like memory allocators |
Date | |
Msg-id | 20161112191257.3a4n6kzegvu637qn@alap3.anarazel.de Whole thread Raw |
In response to | Re: PATCH: two slab-like memory allocators (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: PATCH: two slab-like memory allocators
|
List | pgsql-hackers |
Hi, Subject: [PATCH 1/2] slab allocator diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 6ad7e7d..520f295 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c I'd rather have that in a separate followup commit... + * IDENTIFICATION + * src/backend/utils/mmgr/slab.c + * + * + * The constant allocation size allows significant simplification and various + * optimizations that are not possible in AllocSet. Firstly, we can get rid + * of the doubling and carve the blocks into chunks of exactly the right size + * (plus alignment), not wasting memory. References to AllocSet aren't necessarily a good idea, they'll quite possibly get out of date. The argument can be quite easily be made without referring to a concrete reference to behaviour elsewhere. + * + * At the context level, we use 'freelist' array to track blocks grouped by + * number of free chunks. For example freelist[0] is a list of completely full + * blocks, freelist[1] is a block with a single free chunk, etc. Hm. Those arrays are going to be quite large for small allocations w/ big blocks (an imo sensible combination). Maybe it'd make more sense to model it as a linked list of blocks? Full blocks are at one end, empty ones at the other? + * About MEMORY_CONTEXT_CHECKING: + * + * Since we usually round request sizes up to the next power of 2, there + * is often some unused space immediately after a requested data area. I assume the "round up" stuff is copy-paste? + * Thus, if someone makes the common error of writing past what they've + * requested, the problem is likely to go unnoticed ... until the day when + * there *isn't* any wasted space, perhaps because of different memory + * alignment on a new platform, or some other effect. To catch this sort + * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond + * the requested space whenever the request is less than the actual chunk + * size, and verifies that the byte is undamaged when the chunk is freed. + * + * + * About USE_VALGRIND and Valgrind client requests: + * + * Valgrind provides "client request" macros that exchange information with + * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out + * currently-used macros. + * + * When running under Valgrind, we want a NOACCESS memory region both before + * and after the allocation. The chunk header is tempting as the preceding + * region, but mcxt.c expects to able to examine the standard chunk header + * fields. Therefore, we use, when available, the requested_size field and + * any subsequent padding. requested_size is made NOACCESS before returning + * a chunk pointer to a caller. However, to reduce client request traffic, + * it is kept DEFINED in chunks on the free list. + * + * The rounded-up capacity of the chunk usually acts as a post-allocation + * NOACCESS region. If the request consumes precisely the entire chunk, + * there is no such region; another chunk header may immediately follow. In + * that case, Valgrind will not detect access beyond the end of the chunk. + * + * See also the cooperating Valgrind client requests in mcxt.c. I think we need a preliminary patch moving a lot of this into something like mcxt_internal.h. Copying this comment, and a lot of the utility functions, into every memory context implementation is a bad pattern. +typedef struct SlabBlockData *SlabBlock; /* forward reference */ +typedef struct SlabChunkData *SlabChunk; Can we please not continue hiding pointers behind typedefs? It's a bad pattern, and that it's fairly widely used isn't a good excuse to introduce further usages of it. +/* + * SlabContext is a specialized implementation of MemoryContext. + */ +typedef struct SlabContext +{ + MemoryContextData header; /* Standard memory-context fields */ + /* Allocation parameters for this context: */ + Size chunkSize; /* chunk size */ + Size fullChunkSize; /* chunk size including header and alignment */ + Size blockSize; /* block size */ + int chunksPerBlock; /* number of chunks per block */ + int minFreeChunks; /* min number of free chunks in any block */ + int nblocks; /* number of blocks allocated */ + /* Info about storage allocated in this context: */ + SlabBlock freelist[1]; /* free lists (block-level) */ I assume this is a variable-length array? If so, that a) should be documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so actually will cause compiler warnings and potential misoptimizations. +/* + * SlabBlockData + * Structure of a single block in SLAB allocator. + * + * slab: context owning this block What do we need this for? + * prev, next: used for doubly-linked list of blocks in global freelist I'd prefer using an embedded list here (cf. ilist.h). +/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + * However it's possible to fields in front of the StandardChunkHeader fields, + * which is used to add pointer to the block. + */ Wouldn't that be easier to enforce - particularly around alignment requirements - by embedding a StandardChunkHeader here? That'd also avoid redundancies. +/* ---------- + * Debug macros + * ---------- + */ +#ifdef HAVE_ALLOCINFO +#define SlabFreeInfo(_cxt, _chunk) \ + fprintf(stderr, "SlabFree: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size) +#define SlabAllocInfo(_cxt, _chunk) \ + fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size) +#else +#define SlabFreeInfo(_cxt, _chunk) +#define SlabAllocInfo(_cxt, _chunk) +#endif Do we really have to copy that stuff from aset.c? Obviously no-one uses that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is defined: /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’, butargument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=] fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ +#ifdef CLOBBER_FREED_MEMORY + +/* Wipe freed memory for debugging purposes */ +static void +wipe_mem(void *ptr, size_t size) +#ifdef MEMORY_CONTEXT_CHECKING +static void +set_sentinel(void *base, Size offset) + +static bool +sentinel_ok(const void *base, Size offset) +#endif +#ifdef RANDOMIZE_ALLOCATED_MEMORY +static void +randomize_mem(char *ptr, size_t size) Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as static inlines. +MemoryContext +SlabContextCreate(MemoryContext parent, + const char *name, + Size blockSize, + Size chunkSize) +{ + int chunksPerBlock; + Size fullChunkSize; + Slab set; + + /* chunk, including SLAB header (both addresses nicely aligned) */ + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize)); + /* make sure the block can store at least one chunk (plus a bitmap) */ + if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1) + elog(ERROR, "block size %ld for slab is too small for chunks %ld", + blockSize, chunkSize); I assume the 1 is the bitmap size? + /* so how many chunks can we fit into a block, including header and bitmap? */ + chunksPerBlock + = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1); I'm slightly drunk due to bad airline wine, but right now that seems a bit odd and/or documentation worthy. I understand the (xxx + 7) / 8 pattern elsewhere, but right now I can't follow the - 7. +/* + * SlabAlloc + * Returns pointer to allocated memory of given size or NULL if + * request could not be completed; memory is added to the set. + * + * No request may exceed: + * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ + * All callers use a much-lower limit. That seems like a meaningless comment in the context of a slab allocator with a fixed size. + /* + * If there are no free chunks in any existing block, create a new block + * and put it to the last freelist bucket. + * + * (set->minFreeChunks == 0) means there are no blocks with free chunks, + * thanks to how minFreeChunks is updated at the end of SlabAlloc(). + */ + if (set->minFreeChunks == 0) + { + block = (SlabBlock)malloc(set->blockSize); Space after cast - maybe run pgindent over the file before submission? Doing that manually helps to avoid ugly damange by the per-release run later. I'm pretty sure there'll be a significant number of changes. + if (block->nfree == 0) + block->firstFreeChunk = set->chunksPerBlock; + else + { + /* look for the next free chunk in the block, after the first one */ + while ((++block->firstFreeChunk) < set->chunksPerBlock) + { + int byte = block->firstFreeChunk / 8; + int bit = block->firstFreeChunk % 8; + + /* stop when you find 0 (unused chunk) */ + if (! (block->bitmapptr[byte] & (0x01 << bit))) + break; + } I haven't profiled (or even compiled) this patchset yet, but FWIW, in the tuple deforming code, I could measure a noticeable speedup by accessing bitmap-bytes in the native word-size, rather than char. I'm *NOT* saying you should do that, but if this ever shows up as a bottlneck, it might be worthwhile to optimize. + /* + * And finally update minFreeChunks, i.e. the index to the block with the + * lowest number of free chunks. We only need to do that when the block + * got full (otherwise we know the current block is the right one). + * We'll simply walk the freelist until we find a non-empty entry. + */ + if (set->minFreeChunks == 0) + for (idx = 1; idx <= set->chunksPerBlock; idx++) + if (set->freelist[idx]) + { + set->minFreeChunks = idx; + break; + } Yuck. This definitely needs braces. Regards, Andres
pgsql-hackers by date: