Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAnQnZPJ63-vd_sUJgnc7L818x1oCnaf7vupt7J3JLo+w@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Thu, Dec 22, 2022 at 7:24 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Dec 20, 2022 at 3:09 PM John Naylor > > <john.naylor@enterprisedb.com> wrote: > > > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de > > > > > > I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when exceedingworkmem. We don't have that design constraint. > > > > You mean that the memory used by the radix tree should be limited not > > by the amount of memory actually used, but by the amount of memory > > allocated? In other words, it checks by MomoryContextMemAllocated() in > > the local cases and by dsa_get_total_size() in the shared case. > > I mean, if this patch set uses 10x less memory than v15 (not always, but easy to find cases where it does), and if it'salso expensive to track memory use precisely, then we don't have an incentive to track memory precisely. Even if we did,we don't want to assume that every future caller of radix tree is willing to incur that cost. Understood. > > > The idea of using up to half of maintenance_work_mem might be a good > > idea compared to the current flat-array solution. But since it only > > uses half, I'm concerned that there will be users who double their > > maintenace_work_mem. When it is improved, the user needs to restore > > maintenance_work_mem again. > > I find it useful to step back and look at the usage patterns: > > Autovacuum: Limiting the memory allocated by vacuum is important, since there are multiple workers and they can run atany time (possibly most of the time). This case will not use parallel index vacuum, so will use slab, where the quick estimationof memory taken by the context is not terribly far off, so we can afford to be more optimistic here. > > Manual vacuum: The default configuration assumes we want to finish as soon as possible (vacuum_cost_delay is zero). Parallelindex vacuum can be used. My experience leads me to believe users are willing to use a lot of memory to make manualvacuum finish as quickly as possible, and are disappointed to learn that even if maintenance work mem is 10GB, vacuumcan only use 1GB. Agreed. > So I don't believe anyone will have to double maintenance work mem after upgrading (even with pessimistic accounting) becausewe'll be both > - much more efficient with memory on average > - free from the 1GB cap Make sense. > > That said, it's possible 50% is too pessimistic -- a 75% threshold will bring us very close to powers of two for example: > > 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going > 766 + 256 = 1022MB -> stop > > I'm not sure if that calculation could cause going over the limit, or how common that would be. > If the value is a power of 2, it seems to work perfectly fine. But for example if it's 700MB, the total memory exceeds the limit: 2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going 510 + 256 = 766MB -> stop but it exceeds the limit. In a more bigger case, if it's 11000MB, 2*(1+2+...+2048) = 8190MB (74.4%) 8190 + 4096 = 12286MB That being said, I don't think they are not common cases. So the 75% threshold seems to work fine in most cases. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: