On 09/11/2015 06:55 PM, Robert Haas wrote:
> On Wed, Sep 9, 2015 at 11:54 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Secondly, we limit the number of buckets to INT_MAX, so about 16GB (because
>> buckets are just pointers). No matter how awful estimate you get (or how
>> insanely high you set work_mem) you can't exceed this.
>
> OK, so this is an interesting point, and I think it clarifies things.
> Essentially, we're arguing about whether a 16GB limit is as good as a
> 512MB limit. Right now, if we would have allocated more than 512MB,
> we instead fail. There are two possible solutions:
>
> 1. I'm arguing for maintaining the 512MB limit, but by clamping the
> allocation to 512MB (and the number of buckets accordingly) so that it
> works with fewer buckets instead of failing.
>
> 2. You're arguing for removing the 512MB limit, allowing an initial
> allocation of up to 16GB.
I'm arguing for fixing the existing bug, and then addressing the case of
over-estimation separately, with proper analysis.
>
> My judgement is that #2 could give some people a nasty surprise, in
> that such a large initial allocation might cause problems, especially
> if driven by a bad estimate. Your judgement is that this is unlikely
> to be a problem, and that the performance consequences of limiting a
> hash join to an initial allocation of 64 million buckets rather than 2
> billion buckets are the thing to worry about.
Not quite, my judgment is that
- We shouldn't address this in this particular bugfix, because it's a separete problem (even if we limit the initial
allocation,we still have to fix the repalloc after we build the Hash).
- I assume the "might cause problems" refers to malloc() issues on some platforms. In that case we still have to apply
itto both places, not just to the initial allocation. I don't know if this is a problem (I haven't heard any such
reportsuntil now), but if it is we better address this consistently everywhere, not just this one place.
- I'm not really sure about the impact of the additional resize. I surely don't want to significantly penalize the
well-estimatedcases, so I'd like to see some numbers first.
>
> I guess we'll need to wait for some other opinions.
>
OK
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services