Interesting misbehavior of repalloc() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Interesting misbehavior of repalloc() |
Date | |
Msg-id | 4325.1186857372@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Interesting misbehavior of repalloc()
Re: Interesting misbehavior of repalloc() |
List | pgsql-hackers |
I was just tracing through a memory leak occurring when regexp_matches() is executed a lot of times within one query, for instance this rather stupid test case: select count(*) from (select regexp_matches(repeat('xyxxy',100), '[xy]', 'g') from generate_series(1,100000)) ss; I thought it was because regexp_matches() wasn't bothering to clean up the stuff it allocated in the per-query context, but after I fixed that there was still a leak :-(. It took me a while to realize what was going on. The problem stems from an improvement(?) I put into AllocSetRealloc() awhile ago, which is to try to save memcpy cycles by enlarging a chunk in-place if possible, that is, if it's the last chunk in the containing memory block and there's enough room in the block. This results in the following cycle: 1. regexp_matches asks for a 2K chunk. There's nothing in the 2K chunk freelist, so AllocSetAlloc allocates a new chunk from the end of the context's current memory block. 2. regexp_matches needs more space and repalloc's the chunk to 4K. AllocSetRealloc notices it can realloc in place, and does so. 3. When regexp_matches is done with the current call, it politely releases the chunk, and AllocSetFree sticks it into the freelist for 4K chunks. 4. The next call of regexp_matches asks for a 2K chunk. There's nothing in the 2K chunk freelist, so AllocSetAlloc allocates a new chunk from the end of the context's current memory block. Lather, rinse, repeat --- each cycle adds another entry to the 4K-chunk freelist, which we'll never use. Obviously this is not regexp_matches' fault; it's doing everything by the book. There are similar usage patterns elsewhere (particularly StringInfo) so I'm surprised we've not recognized the problem before. Perhaps we should just remove lines 934-982 of aset.c, and always handle small-chunk reallocs with the "brute force" case. Can anyone see a way to salvage something from the "realloc-in-place" idea? One thought that comes to mind is to try to make AllocSetFree recognize when it's pfree'ing the last chunk in a memory block, and to handle that by decrementing the freeptr instead of putting the chunk into any freelist. This isn't a 100% solution because it only works if no new chunk has been made since the repalloc enlargement. It would handle the sort of repetitive cycle the test case exposes, since after the first cycle or two everything fixed-size is going into chunks that are just repeatedly obtained from the freelists and put back again. But I'm afraid that this would just slow down most cases, by adding cycles to AllocSetFree (plus the subsequent AllocSetAlloc takes longer, since the freeptr-increment path is a bit slower than just taking an extant chunk off the freelist). The savings in memcpy work when AllocSetRealloc gets to win probably don't justify adding any overhead to paths that don't involve realloc. Comments, better ideas? regards, tom lane
pgsql-hackers by date: