Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Default setting for enable_hashagg_disk |
Date | |
Msg-id | 20200710141714.GI12375@tamriel.snowman.net Whole thread Raw |
In response to | Re: Default setting for enable_hashagg_disk (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Default setting for enable_hashagg_disk
|
List | pgsql-hackers |
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost <sfrost@snowman.net> wrote: > > I didn't, and don't, think it particularly relevant to the discussion, > > but if you don't like the comparison to Sort then we could compare it to > > a HashJoin instead- the point is that, yes, if you are willing to give > > more memory to a given operation, it's going to go faster, and the way > > users tell us that they'd like the query to use more memory is already > > well-defined and understood to be through work_mem. We do not do them a > > service by ignoring that. > > The hash_mem design (as it stands) would affect both hash join and > hash aggregate. I believe that it makes most sense to have hash-based > nodes under the control of a single GUC. I believe that this > granularity will cause the least problems. It certainly represents a > trade-off. So, now this has moved from being a hack to deal with a possible regression for a small number of users due to new behavior in one node, to a change that has impacts on other nodes that hadn't been changed, all happening during beta. No, I don't agree with this. Now is not the time for designing new features for v13. > work_mem is less of a problem with hash join, primarily because join > estimates are usually a lot better than grouping estimates. But it is > nevertheless something that it makes sense to put in the same > conceptual bucket as hash aggregate, pending a future root and branch > redesign of work_mem. I'm still not thrilled with the 'hash_mem' kind of idea as it's going in the wrong direction because what's actually needed is a way to properly consider and track overall memory usage- a redesign of work_mem (or some new parameter, but it wouldn't be 'hash_mem') as you say, but all of this discussion should be targeting v14. > Like I said, the escape hatch GUC is not my preferred solution. But at > least it acknowledges the problem. I don't think that anyone (or > anyone else) believes that work_mem doesn't have serious limitations. work_mem obviously has serious limitations, but that's not novel or new or unexpected by anyone. > > We have a parameter which already drives this and which users are > > welcome to (and quite often do) tune. I disagree that anything further > > is either essential or particularly desirable. > > This is a user hostile attitude. I don't find that argument convincing, at all. > > I'm really rather astounded at the direction this has been going in. > > Why? Due to the fact that we're in beta and now is not the time to be redesigning this feature. What Jeff implemented was done in a way that follows the existing structure for how all of the other nodes work and how HashAgg was *intended* to work (as in- if we thought the HashAgg would go over work_mem, we wouldn't pick it and would do a GroupAgg instead). If there's bugs in his implementation (which I doubt, but it can happen, of course) then that'd be useful to discuss and look at fixing, but this discussion isn't appropriate for beta. Thanks, Stephen
Attachment
pgsql-hackers by date: