Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) |
Date | |
Msg-id | CAH2-Wz=bXm-mc_vJ9q42Yns2PcG2dSBPG4H+30eELuDLV93Yfw@mail.gmail.com Whole thread Raw |
In response to | Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Re: Treating work_mem as a shared resource (Was: Parallel Hash take II) |
List | pgsql-hackers |
On Fri, Nov 17, 2017 at 7:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Nov 16, 2017 at 11:50 AM, Serge Rielau <serge@rielau.com> wrote: >> >> Just as you have, we have also considered holistic provisioning of work_mem across all consumers, but we find that tobe too complex. >> Having an “emergency fund” in shared memory is also an option, but I find it too limiting. > > > I agree. Yeah. I suspect that that idea is not ambitious enough to do a lot of what we want, and yet is too ambitious to justify working on given its limited shelf life. > I think this is basically a planning problem. For example, say we wanted to have work_mem_per_query instead of work_mem_per_node. There is an obvious design: consider memory use as an independent dimension of merit during path generationand comparison (less is better). Discard candidate paths whose memory use exceeds the work_mem_per_query budgetunless there are no other alternatives. At the end of planning, pick the cheapest path that survived the memory-budgetfilter. Now, this has the problem that it would make planning more expensive (because we'd hang on to morepaths for longer) but it solves a lot of other problems. If there's no memory pressure, we can use memory like mad evenwhen it doesn't save much, but when we have to pick between using more memory for one part of the plan and using morememory for another part of the plan, the choice that does the best job reducing overall execution time will win. Awesome. I'd like to hear some opinions on the feasibility of this approach. Does David have anything to say about it, for example? > We could also do more localized variants of this that don't provide hard guarantees but do tend to avoid squandering resources. That sounds like independent work, though it could be very useful. > Yet another thing we could do is to try to get nodes to voluntarily use less than work_mem when possible. This is particularlyan issue for sorts. A 2-batch hash join is so much more expensive than a single-batch hash join that it's almostnever going to make sense unless we have no realistic alternative, although I suppose a 64-batch hash join might benot that different from a 32-batch hash join. But for sorts, given all Peter's work in this area, I bet there are a lotof sorts that could budget a quarter or less of work_mem and really not be hurt very much. It depends somewhat on howfast and how contended your I/O is, though, which we don't have an especially good way to model. I'm starting to wonderif that sort_mem GUC might be a good idea... use that for sorts, and keep work_mem for everything else. Right. The ability for sorts to do well with less memory is really striking these days. And though I didn't mean to seriously suggest it, a hash_mem GUC does seem like it solves some significant problems without much risk. I think it should be hash_mem, not sort_mem, because hashing seems more like the special case among operations that consume work_mem, and because sort_mem is already the old name for work_mem that is still accepted as a work_mem alias, and because hash_mem avoids any confusion about whether or not CREATE INDEX uses the new GUC (it clearly does not). Since I am primarily concerned about the difference in sensitivity to the availability of memory that exists when comparing sorting and hashing, and since a new GUC seems like it would noticeably improve matters, I am beginning to take the idea of writing a hash_mem patch for v11 seriously. -- Peter Geoghegan
pgsql-hackers by date: