Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: DBT-3 with SF=20 got failed |
Date | |
Msg-id | 55F30B80.3090804@2ndquadrant.com Whole thread Raw |
In response to | Re: DBT-3 with SF=20 got failed (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: DBT-3 with SF=20 got failed
|
List | pgsql-hackers |
On 09/11/2015 06:55 PM, Robert Haas wrote: > On Wed, Sep 9, 2015 at 11:54 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Secondly, we limit the number of buckets to INT_MAX, so about 16GB (because >> buckets are just pointers). No matter how awful estimate you get (or how >> insanely high you set work_mem) you can't exceed this. > > OK, so this is an interesting point, and I think it clarifies things. > Essentially, we're arguing about whether a 16GB limit is as good as a > 512MB limit. Right now, if we would have allocated more than 512MB, > we instead fail. There are two possible solutions: > > 1. I'm arguing for maintaining the 512MB limit, but by clamping the > allocation to 512MB (and the number of buckets accordingly) so that it > works with fewer buckets instead of failing. > > 2. You're arguing for removing the 512MB limit, allowing an initial > allocation of up to 16GB. I'm arguing for fixing the existing bug, and then addressing the case of over-estimation separately, with proper analysis. > > My judgement is that #2 could give some people a nasty surprise, in > that such a large initial allocation might cause problems, especially > if driven by a bad estimate. Your judgement is that this is unlikely > to be a problem, and that the performance consequences of limiting a > hash join to an initial allocation of 64 million buckets rather than 2 > billion buckets are the thing to worry about. Not quite, my judgment is that - We shouldn't address this in this particular bugfix, because it's a separete problem (even if we limit the initial allocation,we still have to fix the repalloc after we build the Hash). - I assume the "might cause problems" refers to malloc() issues on some platforms. In that case we still have to apply itto both places, not just to the initial allocation. I don't know if this is a problem (I haven't heard any such reportsuntil now), but if it is we better address this consistently everywhere, not just this one place. - I'm not really sure about the impact of the additional resize. I surely don't want to significantly penalize the well-estimatedcases, so I'd like to see some numbers first. > > I guess we'll need to wait for some other opinions. > OK -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: