Thread: Whether to back-patch fix for aggregate transtype width estimates

Whether to back-patch fix for aggregate transtype width estimates

From

Tom Lane

Date:

18 June 2016, 02:41:57

While fixing the recent unpleasantness over parallel polymorphic
aggregates, I discovered that count_agg_clauses_walker's consideration
of an aggregate argument's typmod in estimating transition space
consumption has been broken since commit 34d26872e (which changed
Aggref.args from a simple expression list to a list of TargetEntry).
It had been looking at "exprTypmod((Node *) linitial(aggref->args))",
and nobody taught it that there was now a TargetEntry in the way that
it needed to look through to get a correct answer.

Ordinarily I'd just summarily back-patch a fix, but that commit shipped
in 9.0, which means it's been broken a long time.  I'm worried that
back-patching a change might be more likely to destabilize plan choices
than to do anything anybody's happy about.

Thoughts?
        regards, tom lane

Re: Whether to back-patch fix for aggregate transtype width estimates

From

Robert Haas

Date:

18 June 2016, 04:50:37

On Fri, Jun 17, 2016 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While fixing the recent unpleasantness over parallel polymorphic
> aggregates, I discovered that count_agg_clauses_walker's consideration
> of an aggregate argument's typmod in estimating transition space
> consumption has been broken since commit 34d26872e (which changed
> Aggref.args from a simple expression list to a list of TargetEntry).
> It had been looking at "exprTypmod((Node *) linitial(aggref->args))",
> and nobody taught it that there was now a TargetEntry in the way that
> it needed to look through to get a correct answer.
>
> Ordinarily I'd just summarily back-patch a fix, but that commit shipped
> in 9.0, which means it's been broken a long time.  I'm worried that
> back-patching a change might be more likely to destabilize plan choices
> than to do anything anybody's happy about.
>
> Thoughts?

I suspect the consequences here aren't too bad, or someone would have
noticed by now.  So I would be tempted to leave it alone in
back-branches.  But I might change my mind if it's actually awful...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Whether to back-patch fix for aggregate transtype width estimates

From

Tom Lane

Date:

18 June 2016, 16:14:26

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 17, 2016 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ordinarily I'd just summarily back-patch a fix, but that commit shipped
>> in 9.0, which means it's been broken a long time.  I'm worried that
>> back-patching a change might be more likely to destabilize plan choices
>> than to do anything anybody's happy about.

> I suspect the consequences here aren't too bad, or someone would have
> noticed by now.  So I would be tempted to leave it alone in
> back-branches.  But I might change my mind if it's actually awful...

Well, you can construct scenarios where it would cause failures.
Consider "SELECT max(varchar_col) FROM tab GROUP BY foo".  The planner
will need to estimate the size of the hash table to decide whether
hash-style aggregation is OK.  In all 8.x releases, it would use the
varchar_col's typmod (max width) to determine the per-aggregate trans
value space requirement.  In 9.x, that's broken and it falls back to
get_typavgwidth's default guess of 32 bytes.  If what you've actually
got is, say, varchar(255) and most of the entries actually approach
that length, this could result in a drastic underestimate, possibly
leading to OOM from hash table growth.

However, I can't recall many field reports that seem to match that
theory, so in practice it's probably pretty rare.  It's certainly not
going to help people who declare their wide columns as "text"
not "varchar(n)".

Thinking about this more, I wonder why we delegate to get_typavgwidth
at all; if the input is a Var seems like we should go to pg_statistic
for a column width estimate.  But that's definitely not something to
back-patch.
        regards, tom lane

Re: Whether to back-patch fix for aggregate transtype width estimates

From

Greg Stark

Date:

19 June 2016, 02:13:26

On Sat, Jun 18, 2016 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> .  In 9.x, that's broken and it falls back to
> get_typavgwidth's default guess of 32 bytes.  If what you've actually
> got is, say, varchar(255) and most of the entries actually approach
> that length, this could result in a drastic underestimate, possibly
> leading to OOM from hash table growth.

This seems more likely to result in the converse. 32 bytes is enough
distinct values I imagine it's going to avoid a hash join. (and In any
case if you have a varchar(n) where n>32 then it's probably a bad bet
to assume n gives much information about the typical length of the
strings). On the other hand if what you've actually got is a
varchar(1) or something like that then indeed a hash join might have
been a good choice.

-- 
greg

Re: Whether to back-patch fix for aggregate transtype width estimates

From

Tomas Vondra

Date:

19 June 2016, 11:02:38

On 06/18/2016 06:14 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jun 17, 2016 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Ordinarily I'd just summarily back-patch a fix, but that commit shipped
>>> in 9.0, which means it's been broken a long time.  I'm worried that
>>> back-patching a change might be more likely to destabilize plan choices
>>> than to do anything anybody's happy about.
>
>> I suspect the consequences here aren't too bad, or someone would have
>> noticed by now.  So I would be tempted to leave it alone in
>> back-branches.  But I might change my mind if it's actually awful...
>
> Well, you can construct scenarios where it would cause failures.
> Consider "SELECT max(varchar_col) FROM tab GROUP BY foo".  The planner
> will need to estimate the size of the hash table to decide whether
> hash-style aggregation is OK.  In all 8.x releases, it would use the
> varchar_col's typmod (max width) to determine the per-aggregate trans
> value space requirement.  In 9.x, that's broken and it falls back to
> get_typavgwidth's default guess of 32 bytes.  If what you've actually
> got is, say, varchar(255) and most of the entries actually approach
> that length, this could result in a drastic underestimate, possibly
> leading to OOM from hash table growth.
>
> However, I can't recall many field reports that seem to match that
> theory, so in practice it's probably pretty rare.  It's certainly not
> going to help people who declare their wide columns as "text"
> not "varchar(n)".

All the HashAgg + OOM reports I can recall (both from community or 
through support) were caused by poor cardinality estimates, i.e. not 
related to this at all. The only exception was the array_agg() thing we 
fixed a while ago, and that was primarily due to using per-group memory 
contexts. So also unrelated to this.

So while I'm a fan of improving our planning, I'd lean towards not 
back-patching this particular bit.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services