Thread: Group By and wildcards...

Group By and wildcards...

From

Jon Lapham

Date:

19 February 2005, 14:07:38

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ?  I would like to use a wildcard "*".

Imagine tables a, b, c, d each with hundreds of columns.

As an example, I would like to write:

SELECT a.*, b.*, c.*, SUM(d.blah)
FROM a, b, c, d
WHERE <some join conditions>
GROUP BY a.*, b.*, c.*

Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
explicit column names of all the column in a, b, and c.

This becomes a maintenance nightmare as you add/drop column in these
tables...

Thanks for any advice on how to handle this,
-Jon

PS: I'm using postgresql v7.4.x

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------

Re: Group By and wildcards...

From

Bruno Wolff III

Date:

19 February 2005, 16:49:43

On Sat, Feb 19, 2005 at 12:07:12 -0200,
  Jon Lapham <lapham@jandr.org> wrote:
> When using queries with aggregate functions, is there any way to not
> have to have to explicitly write all the columns names after the GROUP
> BY ?  I would like to use a wildcard "*".
>
> Imagine tables a, b, c, d each with hundreds of columns.
>
> As an example, I would like to write:
>
> SELECT a.*, b.*, c.*, SUM(d.blah)
> FROM a, b, c, d
> WHERE <some join conditions>
> GROUP BY a.*, b.*, c.*
>
> Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
> explicit column names of all the column in a, b, and c.
>
> This becomes a maintenance nightmare as you add/drop column in these
> tables...
>
> Thanks for any advice on how to handle this,
> -Jon

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

Re: Group By and wildcards...

From

Tom Lane

Date:

19 February 2005, 17:40:48

Bruno Wolff III <bruno@wolff.to> writes:
>   Jon Lapham <lapham@jandr.org> wrote:
>> When using queries with aggregate functions, is there any way to not
>> have to have to explicitly write all the columns names after the GROUP
>> BY ?  I would like to use a wildcard "*".

> Don't those tables have primary keys? Grouping by the primay key of each
> table will produce the same result set as grouping by all of the columns.

Unfortunately, PG will still make him GROUP BY everything he wants to
use as a non-aggregated output column.  This behavior is per SQL92
spec.  SQL99 added some verbiage to the effect that you only need to
GROUP BY columns that the rest are functionally dependent on (this
covers primary keys and some other cases); but we haven't got round
to implementing that extension.

            regards, tom lane

Re: Group By and wildcards...

From

Bruno Wolff III

Date:

19 February 2005, 17:52:05

On Sat, Feb 19, 2005 at 12:40:40 -0500,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> >   Jon Lapham <lapham@jandr.org> wrote:
> >> When using queries with aggregate functions, is there any way to not
> >> have to have to explicitly write all the columns names after the GROUP
> >> BY ?  I would like to use a wildcard "*".
>
> > Don't those tables have primary keys? Grouping by the primay key of each
> > table will produce the same result set as grouping by all of the columns.
>
> Unfortunately, PG will still make him GROUP BY everything he wants to
> use as a non-aggregated output column.  This behavior is per SQL92
> spec.  SQL99 added some verbiage to the effect that you only need to
> GROUP BY columns that the rest are functionally dependent on (this
> covers primary keys and some other cases); but we haven't got round
> to implementing that extension.

I forgot about that. However, if maintainance is the most important
consideration, then it is possible to use only the primary keys
to do the grouping and then join that result back to the original
tables to pick up the other columns. It should be possible to do
this without explicitly naming all of the columns. It will be slower
and more complicated, but this might be an acceptable trade off.

Re: Group By and wildcards...

From

Jon Lapham

Date:

19 February 2005, 18:00:48

Tom Lane wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
>
>>  Jon Lapham <lapham@jandr.org> wrote:
>>
>>>When using queries with aggregate functions, is there any way to not
>>>have to have to explicitly write all the columns names after the GROUP
>>>BY ?  I would like to use a wildcard "*".
>
>
>>Don't those tables have primary keys? Grouping by the primay key of each
>>table will produce the same result set as grouping by all of the columns.

Bruno, this is true, but I want all the columns to appear in the output.

> Unfortunately, PG will still make him GROUP BY everything he wants to
> use as a non-aggregated output column.  This behavior is per SQL92
> spec.  SQL99 added some verbiage to the effect that you only need to
> GROUP BY columns that the rest are functionally dependent on (this
> covers primary keys and some other cases); but we haven't got round
> to implementing that extension.

Ugh.

Since I do not want to have to re-write all my aggregate function
containing queries upon modifications to the table definitions (and I do
not want to write multi-thousand character long SELECT statements),
maybe it is easier to use a temp table intermediary?

SELECT a.id AS aid, SUM(d.blah) AS sum_blah
INTO TEMPORARY TABLE foo
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>

followed by

SELECT *
FROM a, b, c, foo
WHERE <some join conditions linking a,b,c>
   AND foo.aid=a.id

Ugly... ugly... any other ideas on how to do this?  My table definitions
LITERALLY have hundreds of columns, and I need access to them all.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------

Re: Group By and wildcards...

From

Bruno Wolff III

Date:

19 February 2005, 18:21:53

On Sat, Feb 19, 2005 at 15:59:52 -0200,
  Jon Lapham <lapham@jandr.org> wrote:
>
> Since I do not want to have to re-write all my aggregate function
> containing queries upon modifications to the table definitions (and I do
> not want to write multi-thousand character long SELECT statements),
> maybe it is easier to use a temp table intermediary?
>
> SELECT a.id AS aid, SUM(d.blah) AS sum_blah
> INTO TEMPORARY TABLE foo
> FROM a, b, c, d
> WHERE <some join conditions linking a,b,c,d>
>
> followed by
>
> SELECT *
> FROM a, b, c, foo
> WHERE <some join conditions linking a,b,c>
>   AND foo.aid=a.id
>
> Ugly... ugly... any other ideas on how to do this?  My table definitions
> LITERALLY have hundreds of columns, and I need access to them all.

Well if you are thinking about the above than you might be interested in
seeing a more sketched out example of what I was suggesting in my
followup after Tom's correction.

SELECT a.*, b.*, c.*, e.d1
  FROM a, b, c,
    (SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1
       FROM a, b, c, d
       WHERE <some join conditions linking a,b,c,d>
       GROUP BY a1, b1, c1) AS e
  WHERE
    a.id = e.a1 AND
    b.id = e.b1 AND
    c.id = e.c1
;

Re: Group By and wildcards...

From

Russ Brown

Date:

19 February 2005, 18:34:09

Jon Lapham wrote:
>
> Ugh.
>
> Since I do not want to have to re-write all my aggregate function
> containing queries upon modifications to the table definitions (and I do
> not want to write multi-thousand character long SELECT statements),
> maybe it is easier to use a temp table intermediary?
>

> Ugly... ugly... any other ideas on how to do this?  My table definitions
> LITERALLY have hundreds of columns, and I need access to them all.
>

How about using a view? Create a view that contains only the ID columns,
and then a second view encorporating the first view which joins to the
appropriate tables and fetches the columns you want in the output of
your query.

Just a thought.

--

Russ.

Re: Group By and wildcards...

From

Greg Stark

Date:

19 February 2005, 19:04:50

Bruno Wolff III <bruno@wolff.to> writes:

> On Sat, Feb 19, 2005 at 12:07:12 -0200,
>   Jon Lapham <lapham@jandr.org> wrote:
> >
> > SELECT a.*, b.*, c.*, SUM(d.blah)
> > FROM a, b, c, d
> > WHERE <some join conditions>
> > GROUP BY a.*, b.*, c.*
> >
> > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
> > explicit column names of all the column in a, b, and c.
> >
> > This becomes a maintenance nightmare as you add/drop column in these
> > tables...
>
> Don't those tables have primary keys? Grouping by the primay key of each
> table will produce the same result set as grouping by all of the columns.

Actually it would be kind of nice to have this as a feature. Or mysql's
feature of treating any unnamed columns as something like DISTINCT ON.

However there are a few approaches for dealing with it. None of which are
perfect but if they match your needs they work well.

In the query above you could turn SUM(d.blah) into a subquery expression. This
works well as long as you don't have multiple aggregate queries on the same
table.

SELECT a.*,b.*,c.*,
       (SELECT sum(blah) FROM d WHERE ...) AS d_sum
  FROM a,b,c

This doesn't require a GROUP BY step which means it'll probably be faster. On
the other hand it effectively forces a nested loop scan on d which is not
necessarily the fastest. And if you have multiple aggregates postgres it
forces separate lookups for the same data. It would be nice to have some
feature for breaking out subquery expressions that return multiple rows into
multiple output columns. Something like:

SELECT a.*,b.*,c.*,
       (SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg)
  FROM a,b,c

You could also turn the above into a more complex join like:

SELECT *
  FROM a,b,c,
       (SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d
 WHERE ...
   AND c.groupname = d.groupname

This works well as long as you didn't have the aggregate function applying to
overlapping subsets of d before. (eg, it won't work for sum(product.price) if
multiple invoices can contain the same product).

alternatively you can do something like

SELECT *
  FROM a,b,c,
       (select a.id as a_id, b.id as b_id, c.id as c_id,
               sum(blah) as d_sum
          from a,b,c,d
         where ...
         group by a.id,b.id,c.id
       ) AS sub
 WHERE a.id = a_id
   AND b.id = b_id
   AND c.id = c_id

But that's pretty silly and not usually necessary.

--
greg

Re: Group By and wildcards...

From

Bruno Wolff III

Date:

19 February 2005, 19:05:32

On Sat, Feb 19, 2005 at 14:02:34 -0500,
  Oisin Glynn <me@oisinglynn.com> wrote:
>
> But the where clause defines the result of the aggregate function (in this
> case the SUM)?

Not really.

> Is the only reason for needing the GROUP BY CLAUSE is because the aggregate
> function demands it?

Note that there is also a join to a table d. So that values in d are
being summed up based on some connection from d to the other 3 tables.

> If so could something like the following work where we pass the where clause
> conditions into the function and it performs the aggregate function and
> returns..  I am guessing this would be extremely inefficient?
>
> select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
> where some conditions;

If that function did a select from d, you could make this work, but it
would likely be much slower than doing it in one SQL statement.

Re: Group By and wildcards...

From

"Oisin Glynn"

Date:

19 February 2005, 19:10:10

This is a very NEWBIE suggestion.  I am fully prepared to be laughed out of
town...

But the where clause defines the result of the aggregate function (in this
case the SUM)?

Is the only reason for needing the GROUP BY CLAUSE is because the aggregate
function demands it?

If so could something like the following work where we pass the where clause
conditions into the function and it performs the aggregate function and
returns..  I am guessing this would be extremely inefficient?

select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
where some conditions;

--  Warning complete gibberish pseudo code now follows

function my_cheating_sum(a.id,b.id,c.id)
select SUM(xxx) from a,b,c where some conditions;
end function;

----- Original Message -----
From: "Bruno Wolff III" <bruno@wolff.to>
To: "Jon Lapham" <lapham@jandr.org>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org>
Sent: Saturday, February 19, 2005 13:36
Subject: Re: [GENERAL] Group By and wildcards...


> On Sat, Feb 19, 2005 at 15:59:52 -0200,
>   Jon Lapham <lapham@jandr.org> wrote:
> >
> > Since I do not want to have to re-write all my aggregate function
> > containing queries upon modifications to the table definitions (and I do
> > not want to write multi-thousand character long SELECT statements),
> > maybe it is easier to use a temp table intermediary?
> >
> > SELECT a.id AS aid, SUM(d.blah) AS sum_blah
> > INTO TEMPORARY TABLE foo
> > FROM a, b, c, d
> > WHERE <some join conditions linking a,b,c,d>
> >
> > followed by
> >
> > SELECT *
> > FROM a, b, c, foo
> > WHERE <some join conditions linking a,b,c>
> >   AND foo.aid=a.id
> >
> > Ugly... ugly... any other ideas on how to do this?  My table definitions
> > LITERALLY have hundreds of columns, and I need access to them all.
>
> Well if you are thinking about the above than you might be interested in
> seeing a more sketched out example of what I was suggesting in my
> followup after Tom's correction.
>
> SELECT a.*, b.*, c.*, e.d1
>   FROM a, b, c,
>     (SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1
>        FROM a, b, c, d
>        WHERE <some join conditions linking a,b,c,d>
>        GROUP BY a1, b1, c1) AS e
>   WHERE
>     a.id = e.a1 AND
>     b.id = e.b1 AND
>     c.id = e.c1
> ;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Re: Group By and wildcards...

From

"Sim Zacks"

Date:

21 February 2005, 06:09:39

Even if the function did a select from d, it could still have plenty of
duplicates. To remove that possibility you would have to use the distinct
clause which is also generally less efficient then a group by.

"Bruno Wolff III" <bruno@wolff.to> wrote in message
news:20050219192033.GA24244@wolff.to...
> On Sat, Feb 19, 2005 at 14:02:34 -0500,
>   Oisin Glynn <me@oisinglynn.com> wrote:
> >
> > But the where clause defines the result of the aggregate function (in
this
> > case the SUM)?
>
> Not really.
>
> > Is the only reason for needing the GROUP BY CLAUSE is because the
aggregate
> > function demands it?
>
> Note that there is also a join to a table d. So that values in d are
> being summed up based on some connection from d to the other 3 tables.
>
> > If so could something like the following work where we pass the where
clause
> > conditions into the function and it performs the aggregate function and
> > returns..  I am guessing this would be extremely inefficient?
> >
> > select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
> > where some conditions;
>
> If that function did a select from d, you could make this work, but it
> would likely be much slower than doing it in one SQL statement.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>