Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Use generation context to speed up tuplesorts |
Date | |
Msg-id | 20210730203853.utjj43f6zzn5e2hy@alap3.anarazel.de Whole thread Raw |
In response to | Use generation context to speed up tuplesorts (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use generation context to speed up tuplesorts
Re: Use generation context to speed up tuplesorts |
List | pgsql-hackers |
Hi, On 2021-07-30 18:42:18 +1200, David Rowley wrote: > While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum > sorts for single column sorts), I noticed that we use a separate > memory context to store tuple data and we just reset when we're done > if the sort fits in memory, or we dump the tuples to disk in the same > order they're added and reset the context when it does not. There is > a little pfree() work going on via writetup_heap() which I think > possibly could just be removed to get some additional gains. > > Anyway, this context that stores tuples uses the standard aset.c > allocator which has the usual power of 2 wastage and additional > overheads of freelists etc. I wondered how much faster it would go if > I set it to use a generation context instead of an aset.c one. > > After running make installcheck to make the tenk1 table, running the > attached tuplesortbench script, I get this: > So it seems to save quite a bit of memory getting away from rounding > up allocations to the next power of 2. > > Performance-wise, there's some pretty good gains. (Results in TPS) Very nice! I wonder if there's cases where generation.c would regress performance over aset.c due to not having an initial / "keeper" block? > The patch is just a simple 1-liner at the moment. I likely do need to > adjust what I'm passing as the blockSize to GenerationContextCreate(). > Maybe a better number would be something that's calculated from > work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64)) > so that we just allocate at most a 16th of work_mem per chunk, but not > bigger than 8MB. I don't think changing this will affect the > performance of the above very much. I think it's bad that both genereration and slab don't have internal handling of block sizes. Needing to err on the size of too big blocks to handle large amounts of memory well, just so the contexts don't need to deal with variably sized blocks isn't a sensible tradeoff. I don't think it's acceptable to use ALLOCSET_DEFAULT_MAXSIZE or Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64) for tuplesort.c. There's plenty cases where we'll just sort a handful of tuples, and increasing the memory usage of those by a factor of 1024 isn't good. The Min() won't do any good if a larger work_mem is used. Nor will it be good to use thousands of small allocations for a large in-memory tuplesort just because we're concerned about the initial allocation size. Both because of the allocation overhead, but importantly also because that will make context resets more expensive. To me this says that we should transplant aset.c's block size growing into generation.c. There is at least one path using tuplecontext that reaches code that could end up freeing memory to a significant enough degree to care about generation.c effectively not using that memory: tuplesort_putdatum()->datumCopy()->EOH_flatten_into() On a quick look I didn't find any expanded record user that frees nontrivial amounts of memory, but I didn't look all that carefully. Greetings, Andres Freund
pgsql-hackers by date: