Dynamic shared memory areas - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Dynamic shared memory areas |
Date | |
Msg-id | CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com Whole thread Raw |
Responses |
Re: Dynamic shared memory areas
|
List | pgsql-hackers |
Hi hackers, I would like to propose a new subsystem called Dynamic Shared [Memory] Areas, or "DSA". It provides an object called a "dsa_area" which can be used by multiple backends to share data. Under the covers, a dsa_area is made up of some number of DSM segments, but it appears to client code as a single shared memory heap with a simple allocate/free interface. Because the memory is mapped at different addresses in different backends, it introduces a kind of sharable relative pointer and an operation to convert it to a backend-local pointer. After you have created or attached to a dsa_area, you can use it much like MemoryContextAlloc/pfree, except for the extra hoop to jump through to get the local address: dsa_pointer p; char *mem; p = dsa_allocate(area, 42); mem = (char *) dsa_get_address(area, p); if (mem != NULL) { snprintf(mem, 42, "Hello world"); dsa_free(area, p); } Exposing the dsa_pointer in this way allows client code to build data structures with internal dsa_pointers that will be usable in all backends that attach to the dsa_area. DSA areas have many potential uses, including shared workspaces for various kinds of parallel query execution, longer term storage for in-memory database objects, caches and so forth. In some cases it may be useful to use a dsa_area directly, but there could be a library of useful data structures that know how to use DSA memory. More on all of those topics, with patches, soon. SOME CONTEXT Currently, Postgres provides three classes of memory: 1. Backend-local memory, managed with palloc/pfree, and MemoryContext providing a hierarchy of memory heaps tied to various scopes. Underneath that, there is of course the C runtime's heap and allocator. 2. Traditional non-extensible shared memory mapped into every backend at the same address. This works on Unix because child processes inherit the memory map of the postmaster. In EXEC_BACKEND builds (including Windows) it works because you can ask for memory to be mapped at a specific address and it'll succeed if ASLR is turned off and the backend hasn't been running very long and the address range happens to be still free. This memory is currently managed with an allocate-only allocator. There is a small library of data structures that know how to use (but never free) this memory. 3. DSM memory, our abstraction for shared memory segments created on demand in non-postmaster backends. This memory is mapped at different addresses in different backends. Currently its main use is to provide a chunk of memory for parallel query. To manage the space inside a DSM segment, shm_toc ('table-of-contents') can be used as a kind of allocate-only space manager which allows backends to find the backend-local address of objects within the segment using integer keys. This proposal adds a fourth class, building on the third. Compared with the existing memory classes: * It provides a fully general allocate/free facility, as currently available only in (1), though does not have (1)'s directly dereferenceable pointers. * It grows automatically and can in theory grow as big as virtual memory allows, like (1), though it also provides a way to cap total size so that allocations fail beyond some size. * It provides something like the throw-it-all-away-at-once clean-up facility of (1), since DSA areas can be destroyed, are reference counted, and can optionally be tracked by the resource manager mechanism (riding on DSM's coat tails). * It provides the data sharing between backends of (2) and (3), though doesn't have (2)'s directly dereferenceable pointers. * Through proposals that will follow this one, it will provide for basic data structures that build on top of it such as hash tables, like (2), except that these ones will be able to grow as required and give memory back. * Unlike (1) and (2), client code has to deal with incompatible memory maps. This involves calling dsa_get_address(area, relative_pointer) which amounts to a few instructions to perform a base address lookup and pointer arithmetic. Using processes instead of threads gives Postgres certain advantages, but requires us to deal with shared memory instead of just using something like (1) for all our memory needs, as a hypothetical multi-threaded Postgres fork would presumably do. This proposal is a step towards making our shared memory facilities more powerful and general. IMPLEMENTATON AND HISTORY Back in 2014, Robert Haas proposed sb_alloc[1]. It had two layers: * a 'free page manager' which cuts a piece of memory into 4KB pages and embeds a btree into the empty pages to track contiguous runs of pages, so that you can get and put free page ranges * an allocator which manages a set of backend-private memory regions, each of which has a free page manager; large allocations are handled directly with pages from the free page manager in an existing region, or new regions created as required with malloc; allocations <= 8KB are handled with pools (called "heaps" in that patch) of various object sizes ("size classes") that live in 64KB superblocks, which in turn come from the free page manager DSA uses Robert's free page manager unchanged, except for some debugging by me. It uses the same general approach and much of the code for the higher level allocator, but I have reworked it substantially to replace the MemoryContext interface, put it in DSM segments, introduce the multi-segment relative pointer scheme, and add concurrency support. Compared to some well known malloc implementations which this code takes general inspiration from, the main differences are obviously the shared memory nature, the lack of per-core pools (an avenue for future research that would increase concurrent performance at the cost of increased fragmentation), and it has that lower level page manager. Some other systems go directly to the OS (mmap, sbrk) for superblocks and large objects. The equivalent for us would be to throw away the lower layer and simply create a DSM segment for large allocations and 64KB superblocks, but there are implementation and portability reasons not to want to create very large numbers of DSM segments. Compared to palloc/pfree, DSA aims to waste less space. It has more finely gained size classes (8, 16, 24, 32, 40, 48, ... see dsa_size_classes), uses a page map that uses 8 bytes per 4KB page to keep track of how to free memory instead of putting bookkeeping information in front of every object. Some other notes in no particular order: It's admittedly slightly confusing that the patch currently contains two separate relative pointer concepts: relptr is used by Robert's freespace.c code and provides for sort-of-type-checked offsets relative to a single base, and dsa_pointer is used by dsa.c to provide multi-segment relative pointers that encode a segment index in the higher bits. The lock tranche arguments to dsa_create_dynamic are clunky, but I don't have a better idea currently since you can't allocate and free tranche IDs so I don't see how dsa.c can own that problem. The "dynamic" part of dsa_create_dynamic's name reflects a desire to have an alternative "fixed" version where you can provide it with an already existing piece of memory to manage, such as a pre-existing DSM segment, but that has not been implemented. It's desirable to allow atomic ops on dsa_pointer; I believe Andres Freund plans to make that happen for 64 bit values on 32 bit systems, but if that turns out to be problematic I would want to make dsa_pointer 32 bits on 32 bit systems. PATCH First, please apply dsm-unpin-segment-v2.patch[2], and then dsm-handle-invalid.patch (attached, and also proposed), and finally dsa-v1.patch. I have also attached test-dsa.patch, a small module which exercises the allocator and shows some client code. Thanks to my colleagues Robert Haas for the sb_alloc code that morphed into this patch, and John Gorman and Amit Khandekar for feedback and testing. I'd be most grateful for any feedback. Thanks for reading! [1] https://www.postgresql.org/message-id/flat/CA%2BTgmobkeWptGwiNa%2BSGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAEepm%3D29DZeWf44-4fzciAQ14iY5vCVZ6RUJ-KR2yzs3hPzrkw%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
pgsql-hackers by date: