Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion |
Date | |
Msg-id | 52634461.2020002@fuzzy.cz Whole thread Raw |
In response to | array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion (Frank van Vugt <ftm.van.vugt@foxi.nl>) |
Responses |
Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
|
List | pgsql-bugs |
Hi, On 19.10.2013 15:38, Frank van Vugt wrote: > L.S. > > Something seems wrong here.... when applying arrag_agg() on a large recordset, > above a certain size 'things fall over' and memory usage races off until the > system runs out of it: ... > with g as (select * from f) > select comcat(id::text), min(value) > from g > group by g.value > having count(1) > 1; > Time: 18660,326 ms ... > with g as (select * from f) > select array_agg(id), min(value) > from g > group by g.value > having count(1) > 1; > Time: <none, fails> Hmmmm. I initially thought that this was because of 2x resizing the array in accumArrayResult, but that clearly is not the case (e.g. because the example is constructed so that groups with multiple elements are very unlikely). Then I though that it's because the array is pre-allocated with 64 elements, but that's not an issue either (at least not the main one) because even with 1e7 elements this amounts for about 10M * 8 * 64 = ~5GB and the report says that it consumed ~32GB RAM and swap (not sure how much, but probably not a small amount). On my machine it easily ate 8GB of RAM and 4GB of swap (and then got shot by OOM). Anyway, I disabled the preallocation (i.e. 1 element initially, +1 for each iteration) which should be ~80MB of data, but even then I was unable to execute that query. The effect on smaller queries (say, 1e6 rows) was negligible too - it consumed more or less the same amount of memory, irrespectedly of the preallocation. Then I started to wonder about the memory local context, which is defined like this: arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); which means arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", 0, (8*1024), (8*1024*1024)); I may be wrong, but it seems to me that each group has it's own allocation context and once something gets allocated (which is pretty much granted for each group), it's allocates at least 8kB of memory. With 1e6 groups that's ~8GB (not really far from what I see here), and with 1e7 groups it's ~80GB. Not the most efficient approach for 80MB of values. I tried this: #define ALLOCSET_TINY_MINSIZE 0 #define ALLOCSET_TINY_INITSIZE (1 * 256) #define ALLOCSET_TINY_MAXSIZE (8 * 1024) arr_context = AllocSetContextCreate(rcontext, "accumArrayResult", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); and after a minor tweak in aset.c, which by default enforces that init size >= 1024: if (initBlockSize < 1024) initBlockSize = 1024; I'm able to process even the 1e7 values, although it still consumes a significant amount of memory. The failing query is slightly artificial, but pretty much any array_agg query with large number of groups is going to fail exactly the same. So if we could improve that somehow, that'd be nice. However the more I think about the local memory context, the more I think the idea to use a dedicated memory context for each group is flawed. I think a single context for the whole aggregate would work much better, and indeed - after replacing the local memory context by rcontext (which is just aggcontext passed from array_agg_transfn) the memory consumption was cut in half. The other thing I think we should reevaluate is the preallocation, i.e. the initial size (currently 64) and if growing to 2x the size is a good idea. Based on experience and measurements with multiple custom aggregates I did in the past few weeks, I think a significantly lower initial size (say 16 or maybe even 8) would work equally well. We're talking about difference in percents vs. higher probability of running out of memory. But let's say it's a minor issue, and by switching to a shared memory context we've already saved ~50% memory for such these cases (single element in a group, 1kB chunk - 64*8B = 512B). Regarding the growth rate - I think once array reachch certain size (e.g. 1024 elements), we should probably decrease the growth rate. For example +128 instead of x2 or something like that. With the current approach, we're pretty much bound to have ~50% overhead because of the doubling. regards Tomas
pgsql-bugs by date:
Previous
From: Tomas VondraDate:
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Next
From: Frank van VugtDate:
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion