sb_alloc: a new memory allocator for PostgreSQL - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | sb_alloc: a new memory allocator for PostgreSQL |
Date | |
Msg-id | CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Whole thread Raw |
Responses |
Re: sb_alloc: a new memory allocator for PostgreSQL
Re: sb_alloc: a new memory allocator for PostgreSQL |
List | pgsql-hackers |
Over the last several months, I've been working on a new memory allocator for PostgreSQL. While it's not done, and there are problems, I think I've reached the point where it makes sense to get this out in front of a wider audience and get some feedback, preferably of the sort that doesn't do any permanent damage. I started out down this path because parallel sort will really be much happier if it can allocate from either dynamic shared memory or backend-local memory using the same interface. Arguably, we could implement parallel internal sort using some kind of bespoke memory management strategy, like packing all of the tuples into the memory segment tightly starting from the end, and growing the tuple array from the beginning, but then what happens if we decide to give up on parallelism and do an external sort after all? Even if we could engineer our way around that problem, I think there are bound to be other things that people want to do with dynamic shared memory segments where being able to easily allocate and free memory is desirable. As I got into the problem a little bit further, I realized that AllocSetAlloc is actually pretty poor allocator for sorting. As far as I can see, the basic goal of aset.c was to make context resets cheap - which makes sense, because we have a lot of very short-lived memory contexts. However, when you're sorting a lot of data, being able to reset the context quickly is not as important as using memory efficiently, and AllocSetAlloc is pretty terrible at that, especially for small allocations. Each allocation is rounded up to a power of two, and then we add 16 bytes of overhead on top of that (on 64-bit systems), which means that palloc has 100% overhead for a large number of 16-byte allocations, and 200% overhead for a large number of 8-byte allocations. The 16-byte overhead doesn't hurt quite as much on big allocations, but the rounding to a power of two can sometimes be very bad. For example, for repeated 96-byte allocations, palloc has 50% overhead. I decided I wanted to come up with something better. I read through the literature and found that most modern allocators seemed to use superblocks. A superblock is a chunk of memory, perhaps 64kB, which is carved up into a bunch of chunks of equal size. Those chunks don't have individual headers; instead, the pointer address is used to locate the metadata for the chunk. Requests are binned into size classes, which are generally much more fine-grained than the powers-of-two strategy used by AllocSetAlloc. Of course, too many size classes can waste memory if you end up with many partially-full superblocks, each for a different size class, but the consensus seems to be that you make it up and more by wasting less memory within each allocation (e.g. a 96 byte allocation can be exactly 96 bytes, rather than 128 bytes). I investigated whether there might be an existing allocator we could use; I did not find anything that looked suitable. No allocator that I ran across could handle allocation of shared memory; in fact, pretty much all of them assumed the existence of an underlying "system allocator" that they could use to allocate memory for their own metadata. Those that included a system allocator (like tcmalloc) still didn't support anything that looked like dynamic shared memory. Most of them were also unsuitable for other reasons (e.g. tcmalloc is written in C++). I also didn't find anything that looked like our memory context paradigm, and in particular the ability to cheaply reset a context, in any other allocator. So I decided to write my own; a first cut is attached. It's not integrated with the MemoryContext stuff yet, and very likely needs to be. There are some functions for which I've added prototypes but not implementations; the support for allocation from dynamic shared memory segments is just a stub right now. And there are various other warts as well which will need to be cleaned up in due time if this is to go anywhere. Performance-wise, the raw allocation performance (sb_alloc) is reasonably good. On PPC64, it lost out to palloc on 8-byte allocations but was comparable for larger allocations; on my MacBook, it was faster across the board. The raw deallocation performance (sb_free) is appreciably slower than pfree; that can probably be optimized somewhat. It's not really a fair comparison at this point, though, among other reasons because palloc is jumping through an extra layer of indirection. Leaving that problem to one side, I hacked up index builds to use the new allocator and found that on REINDEX INDEX pgbench_accounts_pkey at scale factor 100, this uses 22% less memory than palloc, but runs about 4-7% slower on PPC64. It may be possible to eliminate some of that with a tighter integration of the new allocator and some more fine-tuning. As you can probably tell, this is all somewhat preliminary and experimental at this point, but thoughts welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: