Re: Question about lazy_space_alloc() / linux over-commit - Mailing list pgsql-hackers
From | Jim Nasby |
---|---|
Subject | Re: Question about lazy_space_alloc() / linux over-commit |
Date | |
Msg-id | 54FE1AC6.8080502@BlueTreble.com Whole thread Raw |
In response to | Re: Question about lazy_space_alloc() / linux over-commit (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Question about lazy_space_alloc() / linux over-commit
Re: Question about lazy_space_alloc() / linux over-commit |
List | pgsql-hackers |
On 3/9/15 12:28 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: >>>> I was thinking the simpler route of just repalloc'ing... the memcpy would >>>> suck, but much less so than the extra index pass. 64M gets us 11M tuples, >>>> which probably isn't very common. >>> >>> That has the chance of considerably increasing the peak memory usage >>> though, as you obviously need both the old and new allocation during the >>> repalloc(). >>> >>> And in contrast to the unused memory at the tail of the array, which >>> will usually not be actually allocated by the OS at all, this is memory >>> that's actually read/written respectively. >> >> Yeah, I'm not sure why everybody wants to repalloc() that instead of >> making several separate allocations as needed. That would avoid >> increasing peak memory usage, and would avoid any risk of needing to >> copy the whole array. Also, you could grow in smaller chunks, like >> 64MB at a time instead of 1GB or more at a time. Doubling an >> allocation that's already 1GB or more gets big in a hurry. > > Yeah, a chunk list rather than a single chunk seemed a good idea to me > too. That will be significantly more code than a simple repalloc, but as long as people are OK with that I can go that route. > Also, I think the idea of starting with an allocation assuming some > small number of dead tuples per page made sense -- and by the time that > space has run out, you have a better estimate of actual density of dead > tuples, so you can do the second allocation based on that new estimate > (but perhaps clamp it at say 1 GB, just in case you just scanned a > portion of the table with an unusually high percentage of dead tuples.) I like the idea of using a fresh idea of dead tuple density when we need more space. We would also clamp this at maintenance_work_mem, not a fixed 1GB. Speaking of which... people have referenced allowing > 1GB of dead tuples, which means allowing maintenance_work_mem > MAX_KILOBYTES. The comment for that says: /* upper limit for GUC variables measured in kilobytes of memory */ /* note that various places assume the byte size fits in a "long" variable */ So I'm not sure how well that will work. I think that needs to be a separate patch. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
pgsql-hackers by date: