Thread: description of Aggregate Expressions

description of Aggregate Expressions

From

John Lumby

Date:

05 December 2019, 22:18:42

In PostgreSQL 12.1 Documentation chapter 4.2.7. Aggregate Expressions  it says


The syntax of an aggregate expression is one of the following:
  ...
aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]
...

I believe this is incorrect in the case where the DISTINCT is on a comma-separated list of expressions.
It would imply that this is legal

select count(DISTINCT parent_id , name) from  mytable

but that is rejected with
ERROR:  function count(bigint, text) does not exist

whereas

select count(DISTINCT ( parent_id , name) ) from mytable

is accepted.

So I think to handle all cases the line in the doc should read

aggregate_name (DISTINCT ( expression [ , ... ] ) [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

I don't know how to indicate that those extra parentheses can be omitted if the list has only one expression.

Cheers,  John Lumby

Re: description of Aggregate Expressions

From

"David G. Johnston"

Date:

05 December 2019, 23:06:12

On Thu, Dec 5, 2019 at 3:18 PM John Lumby <johnlumby@hotmail.com> wrote:

In PostgreSQL 12.1 Documentation chapter 4.2.7. Aggregate Expressions it says

The syntax of an aggregate expression is one of the following:
...
aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]
...

I believe this is incorrect in the case where the DISTINCT is on a comma-separated list of expressions.
It would imply that this is legal

It is...you didn't get a syntax error.

select count(DISTINCT parent_id , name) from mytable

but that is rejected with
ERROR: function count(bigint, text) does not exist

The error is that while the query is syntactically correct in order to execute it as written a function would need to exist that does not. As far as a general syntax diagram goes it has correctly communicated what is legal.

whereas

select count(DISTINCT ( parent_id , name) ) from mytable

is accepted.

Correct, converting the two individual columns into a "tuple" allows the default tuple distinct-making infrastructure to be used to execute the query.

So I think to handle all cases the line in the doc should read

aggregate_name (DISTINCT ( expression [ , ... ] ) [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

I don't know how to indicate that those extra parentheses can be omitted if the list has only one expression.

Then I would have to say the proposed solution to this edge case is worse than the problem. I also don't expect there to be a clean solution to dealing with the complexities of expressions at the syntax diagram level.

David J.

Re: description of Aggregate Expressions

From

Tom Lane

Date:

05 December 2019, 23:29:34

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thu, Dec 5, 2019 at 3:18 PM John Lumby <johnlumby@hotmail.com> wrote:
>> whereas
>> select count(DISTINCT ( parent_id , name) ) from mytable
>> is accepted.

> Correct, converting the two individual columns into a "tuple" allows the
> default tuple distinct-making infrastructure to be used to execute the
> query.

Yeah.  This might be more intelligible if it were written

select count(DISTINCT ROW(parent_id, name) ) from mytable

However, the SQL committee in their finite wisdom have decreed that
the ROW keyword is optional.  (As long as there's more than one
column expression; the need for that special case is another reason
why omitting ROW isn't really a nice thing to do.)

            regards, tom lane

Re: description of Aggregate Expressions

From

John Lumby

Date:

06 December 2019, 18:43:39

On 12/05/19 18:06, David G. Johnston wrote:

On Thu, Dec 5, 2019 at 3:18 PM John Lumby <johnlumby@hotmail.com> wrote:
In PostgreSQL 12.1 Documentation chapter 4.2.7. Aggregate Expressions it says

The syntax of an aggregate expression is one of the following:
...
aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]
...

I believe this is incorrect in the case where the DISTINCT is on a comma-separated list of expressions.
It would imply that this is legal

It is...you didn't get a syntax error.

Hmm, even though true, I think this is unhelpful.
If a reference document states that the syntax for a something-or-other construct is one of the following diagrams,
then I expect that the diagrams are valid for *every* kind of something-or-other, not just some.
Yet the diagram I quote always results in being rejected in the case of COUNT -
which I consider to be as good as saying it is invalid syntax.

select count(DISTINCT parent_id , name) from mytable

but that is rejected with
ERROR: function count(bigint, text) does not exist

The error is that while the query is syntactically correct in order to execute it as written a function would need to exist that does not. As far as a general syntax diagram goes it has correctly communicated what is legal.

whereas

select count(DISTINCT ( parent_id , name) ) from mytable

is accepted.

Correct, converting the two individual columns into a "tuple" allows the default tuple distinct-making infrastructure to be used to execute the query.

So I think to handle all cases the line in the doc should read

aggregate_name (DISTINCT ( expression [ , ... ] ) [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

I don't know how to indicate that those extra parentheses can be omitted if the list has only one expression.

Then I would have to say the proposed solution to this edge case is worse than the problem. I also don't expect there to be a clean solution to dealing with the complexities of expressions at the syntax diagram level.

Yes, I see what I suggested is not ideal either. But I think something needs to be changed.

How about replacing "expression [ , ... ]" by "parameter_list" in the description, and then stating that parameter_list can be either a single expression or , if the particular aggregate function accepts it (for which, consult that function's reference), a comma-separated list of expressions.

David J.

Re: description of Aggregate Expressions

From

Tom Lane

Date:

06 December 2019, 19:17:35

John Lumby <johnlumby@hotmail.com> writes:
> On 12/05/19 18:06, David G. Johnston wrote:
>> Then I would have to say the proposed solution to this edge case is worse than the problem.  I also don't expect
thereto be a clean solution to dealing with the complexities of expressions at the syntax diagram level. 

> Yes,  I see what I suggested is not ideal either.     But I think something needs to be changed.

> How about replacing "expression [ , ... ]"  by "parameter_list" in the description, and then stating that
parameter_listcan be either a single expression or ,   if the particular aggregate function accepts it (for which,
consultthat function's reference),    a comma-separated list of expressions. 

That's just as wrong.  As we tried to explain before, the
parenthesized-list syntax is a row constructor, so it only works
in cases where the aggregate function can accept a composite
argument type.  Most don't.

Moreover, the very same thing holds in *any* expression context,
not only aggregate arguments.  So if we took this seriously there
would have to be a lot of other places plastered with equally
confusing/unhelpful addenda.

            regards, tom lane

Re: description of Aggregate Expressions

From

John Lumby

Date:

06 December 2019, 20:01:14

Tom Lane wrote

> > How about replacing "expression [ , ... ]"  by "parameter_list" in the description,
> > and then stating that parameter_list can be either a single expression or ,
> >  if the particular aggregate function accepts it (for which,   consult that function's reference),
> >   a comma-separated list of expressions.
>
> That's just as wrong.  As we tried to explain before, the
> parenthesized-list syntax is a row constructor, so it only works
> in cases where the aggregate function can accept a composite
> argument type.  Most don't.
>

But surely not *all* cases of a multi-expression parameter list of an aggregate function are row constructors are
they?   What about 

select parent_id, name, GROUPING(parent_id , name), count(*) FROM mytable GROUP BY ROLLUP(parent_id , name);

In the piece "GROUPING(parent_id , name)" ,
  is "(parent_id , name)"  a row constructor or a list of two expressions?

Or are you saying those are one and the same thing?

Cheers,   John

Re: description of Aggregate Expressions

From

John Lumby

Date:

06 December 2019, 21:43:59

John Lumby wrote:
<
< Tom Lane wrote
<
< > > How about replacing "expression [ , ... ]"  by "parameter_list" in the description,
< > > and then stating that parameter_list can be either a single expression or ,
< > >  if the particular aggregate function accepts it (for which,   consult that function's reference),
< > >   a comma-separated list of expressions.
< >
< > That's just as wrong.  As we tried to explain before, the
< > parenthesized-list syntax is a row constructor, so it only works
< > in cases where the aggregate function can accept a composite
< > argument type.  Most don't.
< >
<
< But surely not *all* cases of a multi-expression parameter list of an aggregate function are row constructors are
they?   What about 
<
< select parent_id, name, GROUPING(parent_id , name), count(*) FROM mytable GROUP BY ROLLUP(parent_id , name);
<
< In the piece "GROUPING(parent_id , name)" ,
<   is "(parent_id , name)"  a row constructor or a list of two expressions?
<
< Or are you saying those are one and the same thing?
<

I think I can answer my own question   -  No they are not the same  -   because :

select parent_id, name, GROUPING(ROW(parent_id , name)), count(*) FROM mytable GROUP BY ROLLUP(parent_id , name);
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level
LINE 1: select parent_id, name, GROUPING(ROW(parent_id , name)), cou...
                                         ^