Home > mailing lists
Re: PATCH: two slab-like memory allocators - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: PATCH: two slab-like memory allocators
Date	November 12, 2016 20:53:16
Msg-id	20161112191257.3a4n6kzegvu637qn@alap3.anarazel.de Whole thread Raw
In response to	Re: PATCH: two slab-like memory allocators (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: PATCH: two slab-like memory allocators
List	pgsql-hackers
Tree view
Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *      src/backend/utils/mmgr/slab.c
+ *
+ *
+ *    The constant allocation size allows significant simplification and various
+ *    optimizations that are not possible in AllocSet. Firstly, we can get rid
+ *    of the doubling and carve the blocks into chunks of exactly the right size
+ *    (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ *    At the context level, we use 'freelist' array to track blocks grouped by
+ *    number of free chunks. For example freelist[0] is a list of completely full
+ *    blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?


+ *    About MEMORY_CONTEXT_CHECKING:
+ *
+ *    Since we usually round request sizes up to the next power of 2, there
+ *    is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?


+ *    Thus, if someone makes the common error of writing past what they've
+ *    requested, the problem is likely to go unnoticed ... until the day when
+ *    there *isn't* any wasted space, perhaps because of different memory
+ *    alignment on a new platform, or some other effect.  To catch this sort
+ *    of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ *    the requested space whenever the request is less than the actual chunk
+ *    size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ *    About USE_VALGRIND and Valgrind client requests:
+ *
+ *    Valgrind provides "client request" macros that exchange information with
+ *    the host Valgrind (if any).  Under !USE_VALGRIND, memdebug.h stubs out
+ *    currently-used macros.
+ *
+ *    When running under Valgrind, we want a NOACCESS memory region both before
+ *    and after the allocation.  The chunk header is tempting as the preceding
+ *    region, but mcxt.c expects to able to examine the standard chunk header
+ *    fields.  Therefore, we use, when available, the requested_size field and
+ *    any subsequent padding.  requested_size is made NOACCESS before returning
+ *    a chunk pointer to a caller.  However, to reduce client request traffic,
+ *    it is kept DEFINED in chunks on the free list.
+ *
+ *    The rounded-up capacity of the chunk usually acts as a post-allocation
+ *    NOACCESS region.  If the request consumes precisely the entire chunk,
+ *    there is no such region; another chunk header may immediately follow.  In
+ *    that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ *    See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.


+typedef struct SlabBlockData *SlabBlock;        /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+    MemoryContextData header;    /* Standard memory-context fields */
+    /* Allocation parameters for this context: */
+    Size        chunkSize;        /* chunk size */
+    Size        fullChunkSize;    /* chunk size including header and alignment */
+    Size        blockSize;        /* block size */
+    int            chunksPerBlock;    /* number of chunks per block */
+    int            minFreeChunks;    /* min number of free chunks in any block */
+    int            nblocks;        /* number of blocks allocated */
+    /* Info about storage allocated in this context: */
+    SlabBlock    freelist[1];    /* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ *        Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?

+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).

+/*
+ * SlabChunk
+ *        The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */

Wouldn't that be easier to enforce - particularly around alignment
requirements - by embedding a StandardChunkHeader here? That'd also
avoid redundancies.

+/* ----------
+ * Debug macros
+ * ----------
+ */
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+            fprintf(stderr, "SlabFree: %s: %p, %d\n", \
+                (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+            fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
+                (_cxt)->header.name, (_chunk), (_chunk)->size)
+#else
+#define SlabFreeInfo(_cxt, _chunk)
+#define SlabAllocInfo(_cxt, _chunk)
+#endif

Do we really have to copy that stuff from aset.c? Obviously no-one uses
that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
defined:
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’,
butargument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]   fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
 


+#ifdef CLOBBER_FREED_MEMORY
+
+/* Wipe freed memory for debugging purposes */
+static void
+wipe_mem(void *ptr, size_t size)

+#ifdef MEMORY_CONTEXT_CHECKING
+static void
+set_sentinel(void *base, Size offset)
+
+static bool
+sentinel_ok(const void *base, Size offset)
+#endif

+#ifdef RANDOMIZE_ALLOCATED_MEMORY
+static void
+randomize_mem(char *ptr, size_t size)

Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
static inlines.

+MemoryContext
+SlabContextCreate(MemoryContext parent,
+                      const char *name,
+                      Size blockSize,
+                      Size chunkSize)
+{
+    int        chunksPerBlock;
+    Size    fullChunkSize;
+    Slab    set;
+
+    /* chunk, including SLAB header (both addresses nicely aligned) */
+    fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

+    /* make sure the block can store at least one chunk (plus a bitmap) */
+    if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
+        elog(ERROR, "block size %ld for slab is too small for chunks %ld",
+                    blockSize, chunkSize);

I assume the 1 is the bitmap size?


+    /* so how many chunks can we fit into a block, including header and bitmap? */
+    chunksPerBlock
+        =  (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);

I'm slightly drunk due to bad airline wine, but right now that seems a
bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
pattern elsewhere, but right now I can't follow the - 7.

+/*
+ * SlabAlloc
+ *        Returns pointer to allocated memory of given size or NULL if
+ *        request could not be completed; memory is added to the set.
+ *
+ * No request may exceed:
+ *        MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
+ * All callers use a much-lower limit.

That seems like a meaningless comment in the context of a slab allocator
with a fixed size.


+    /*
+     * If there are no free chunks in any existing block, create a new block
+     * and put it to the last freelist bucket.
+     *
+     * (set->minFreeChunks == 0) means there are no blocks with free chunks,
+     * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+     */
+    if (set->minFreeChunks == 0)
+    {
+        block = (SlabBlock)malloc(set->blockSize);

Space after cast - maybe run pgindent over the file before submission?
Doing that manually helps to avoid ugly damange by the per-release run
later. I'm pretty sure there'll be a significant number of changes.



+    if (block->nfree == 0)
+        block->firstFreeChunk = set->chunksPerBlock;
+    else
+    {
+        /* look for the next free chunk in the block, after the first one */
+        while ((++block->firstFreeChunk) < set->chunksPerBlock)
+        {
+            int byte = block->firstFreeChunk / 8;
+            int bit  = block->firstFreeChunk % 8;
+
+            /* stop when you find 0 (unused chunk) */
+            if (! (block->bitmapptr[byte] & (0x01 << bit)))
+                break;
+        }

I haven't profiled (or even compiled) this patchset yet, but FWIW, in
the tuple deforming code, I could measure a noticeable speedup by
accessing bitmap-bytes in the native word-size, rather than char. I'm
*NOT* saying you should do that, but if this ever shows up as a
bottlneck, it might be worthwhile to optimize.

+    /*
+     * And finally update minFreeChunks, i.e. the index to the block with the
+     * lowest number of free chunks. We only need to do that when the block
+     * got full (otherwise we know the current block is the right one).
+     * We'll simply walk the freelist until we find a non-empty entry.
+     */
+    if (set->minFreeChunks == 0)
+        for (idx = 1; idx <= set->chunksPerBlock; idx++)
+            if (set->freelist[idx])
+            {
+                set->minFreeChunks = idx;
+                break;
+            }

Yuck. This definitely needs braces.


Regards,

Andres
pgsql-hackers by date:
From: Andres Freund
Date: 12 November 2016, 20:53:15
Subject: Re: Logical Replication WIP
From: Tom Lane
Date: 12 November 2016, 21:20:28
Subject: Re: More snapshot-management fun
Re: PATCH: two slab-like memory allocators - Mailing list pgsql-hackers

Previous

Next