Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Vacuum: allow usage of more than 1GB of work mem |
Date | |
Msg-id | CAD21AoCnT3pZk=nO+L4mFxJZfAgfC4wRWZ=X9qrKoEkvr0FAZw@mail.gmail.com Whole thread Raw |
In response to | Re: Vacuum: allow usage of more than 1GB of work mem (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Vacuum: allow usage of more than 1GB of work mem
|
List | pgsql-hackers |
On Wed, Sep 7, 2016 at 2:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >>> The problem with this is that we allocate the entire amount of >>> maintenance_work_mem even when the number of actual dead tuples turns >>> out to be very small. That's not so bad if the amount of memory we're >>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous >>> to remove the 1 GB limit, because somebody might have >>> maintenance_work_mem set to tens or hundreds of gigabytes to speed >>> index creation, and allocating that much space for a VACUUM that >>> encounters 1 dead tuple does not seem like a good plan. >>> >>> What I think we need to do is make some provision to initially >>> allocate only a small amount of memory and then grow the allocation >>> later if needed. For example, instead of having >>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as >>> ItemPointer * and allocate the array progressively in segments. I'd >>> actually argue that the segment size should be substantially smaller >>> than 1 GB, like say 64MB; there are still some people running systems >>> which are small enough that allocating 1 GB when we may need only 6 >>> bytes can drive the system into OOM. >> >> This would however incur the cost of having to copy the whole GB-sized >> chunk every time it's expanded. It woudln't be cheap. > > No, I don't want to end up copying the whole array; that's what I > meant by allocating it progressively in segments. Something like what > you go on to propose. > >> I've monitored the vacuum as it runs and the OS doesn't map the whole >> block unless it's touched, which it isn't until dead tuples are found. >> Surely, if overcommit is disabled (as it should), it could exhaust the >> virtual address space if set very high, but it wouldn't really use the >> memory unless it's needed, it would merely reserve it. > > Yeah, but I've seen actual breakage from exactly this issue on > customer systems even with the 1GB limit, and when we start allowing > 100GB it's going to get a whole lot worse. > >> To fix that, rather than repalloc the whole thing, dead_tuples would >> have to be an ItemPointer** of sorted chunks. That'd be a >> significantly more complex patch, but at least it wouldn't incur the >> memcpy. > > Right, this is what I had in mind. I don't think this is actually > very complicated, because the way we use this array is really simple. > We basically just keep appending to the array until we run out of > space, and that's not very hard to implement with an array-of-arrays. > The chunks are, in some sense, sorted, as you say, but you don't need > to do qsort() or anything like that. You're just replacing a single > flat array with a data structure that can be grown incrementally in > fixed-size chunks. > If we replaced dead_tuples with an array-of-array, isn't there negative performance impact for lazy_tid_reap()? As chunk is added, that performance would be decrease. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: