Thread: Group By and wildcards...
When using queries with aggregate functions, is there any way to not have to have to explicitly write all the columns names after the GROUP BY ? I would like to use a wildcard "*". Imagine tables a, b, c, d each with hundreds of columns. As an example, I would like to write: SELECT a.*, b.*, c.*, SUM(d.blah) FROM a, b, c, d WHERE <some join conditions> GROUP BY a.*, b.*, c.* Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the explicit column names of all the column in a, b, and c. This becomes a maintenance nightmare as you add/drop column in these tables... Thanks for any advice on how to handle this, -Jon PS: I'm using postgresql v7.4.x -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
On Sat, Feb 19, 2005 at 12:07:12 -0200, Jon Lapham <lapham@jandr.org> wrote: > When using queries with aggregate functions, is there any way to not > have to have to explicitly write all the columns names after the GROUP > BY ? I would like to use a wildcard "*". > > Imagine tables a, b, c, d each with hundreds of columns. > > As an example, I would like to write: > > SELECT a.*, b.*, c.*, SUM(d.blah) > FROM a, b, c, d > WHERE <some join conditions> > GROUP BY a.*, b.*, c.* > > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the > explicit column names of all the column in a, b, and c. > > This becomes a maintenance nightmare as you add/drop column in these > tables... > > Thanks for any advice on how to handle this, > -Jon Don't those tables have primary keys? Grouping by the primay key of each table will produce the same result set as grouping by all of the columns.
Bruno Wolff III <bruno@wolff.to> writes: > Jon Lapham <lapham@jandr.org> wrote: >> When using queries with aggregate functions, is there any way to not >> have to have to explicitly write all the columns names after the GROUP >> BY ? I would like to use a wildcard "*". > Don't those tables have primary keys? Grouping by the primay key of each > table will produce the same result set as grouping by all of the columns. Unfortunately, PG will still make him GROUP BY everything he wants to use as a non-aggregated output column. This behavior is per SQL92 spec. SQL99 added some verbiage to the effect that you only need to GROUP BY columns that the rest are functionally dependent on (this covers primary keys and some other cases); but we haven't got round to implementing that extension. regards, tom lane
On Sat, Feb 19, 2005 at 12:40:40 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruno Wolff III <bruno@wolff.to> writes: > > Jon Lapham <lapham@jandr.org> wrote: > >> When using queries with aggregate functions, is there any way to not > >> have to have to explicitly write all the columns names after the GROUP > >> BY ? I would like to use a wildcard "*". > > > Don't those tables have primary keys? Grouping by the primay key of each > > table will produce the same result set as grouping by all of the columns. > > Unfortunately, PG will still make him GROUP BY everything he wants to > use as a non-aggregated output column. This behavior is per SQL92 > spec. SQL99 added some verbiage to the effect that you only need to > GROUP BY columns that the rest are functionally dependent on (this > covers primary keys and some other cases); but we haven't got round > to implementing that extension. I forgot about that. However, if maintainance is the most important consideration, then it is possible to use only the primary keys to do the grouping and then join that result back to the original tables to pick up the other columns. It should be possible to do this without explicitly naming all of the columns. It will be slower and more complicated, but this might be an acceptable trade off.
Tom Lane wrote: > Bruno Wolff III <bruno@wolff.to> writes: > >> Jon Lapham <lapham@jandr.org> wrote: >> >>>When using queries with aggregate functions, is there any way to not >>>have to have to explicitly write all the columns names after the GROUP >>>BY ? I would like to use a wildcard "*". > > >>Don't those tables have primary keys? Grouping by the primay key of each >>table will produce the same result set as grouping by all of the columns. Bruno, this is true, but I want all the columns to appear in the output. > Unfortunately, PG will still make him GROUP BY everything he wants to > use as a non-aggregated output column. This behavior is per SQL92 > spec. SQL99 added some verbiage to the effect that you only need to > GROUP BY columns that the rest are functionally dependent on (this > covers primary keys and some other cases); but we haven't got round > to implementing that extension. Ugh. Since I do not want to have to re-write all my aggregate function containing queries upon modifications to the table definitions (and I do not want to write multi-thousand character long SELECT statements), maybe it is easier to use a temp table intermediary? SELECT a.id AS aid, SUM(d.blah) AS sum_blah INTO TEMPORARY TABLE foo FROM a, b, c, d WHERE <some join conditions linking a,b,c,d> followed by SELECT * FROM a, b, c, foo WHERE <some join conditions linking a,b,c> AND foo.aid=a.id Ugly... ugly... any other ideas on how to do this? My table definitions LITERALLY have hundreds of columns, and I need access to them all. -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
On Sat, Feb 19, 2005 at 15:59:52 -0200, Jon Lapham <lapham@jandr.org> wrote: > > Since I do not want to have to re-write all my aggregate function > containing queries upon modifications to the table definitions (and I do > not want to write multi-thousand character long SELECT statements), > maybe it is easier to use a temp table intermediary? > > SELECT a.id AS aid, SUM(d.blah) AS sum_blah > INTO TEMPORARY TABLE foo > FROM a, b, c, d > WHERE <some join conditions linking a,b,c,d> > > followed by > > SELECT * > FROM a, b, c, foo > WHERE <some join conditions linking a,b,c> > AND foo.aid=a.id > > Ugly... ugly... any other ideas on how to do this? My table definitions > LITERALLY have hundreds of columns, and I need access to them all. Well if you are thinking about the above than you might be interested in seeing a more sketched out example of what I was suggesting in my followup after Tom's correction. SELECT a.*, b.*, c.*, e.d1 FROM a, b, c, (SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1 FROM a, b, c, d WHERE <some join conditions linking a,b,c,d> GROUP BY a1, b1, c1) AS e WHERE a.id = e.a1 AND b.id = e.b1 AND c.id = e.c1 ;
Jon Lapham wrote: > > Ugh. > > Since I do not want to have to re-write all my aggregate function > containing queries upon modifications to the table definitions (and I do > not want to write multi-thousand character long SELECT statements), > maybe it is easier to use a temp table intermediary? > > Ugly... ugly... any other ideas on how to do this? My table definitions > LITERALLY have hundreds of columns, and I need access to them all. > How about using a view? Create a view that contains only the ID columns, and then a second view encorporating the first view which joins to the appropriate tables and fetches the columns you want in the output of your query. Just a thought. -- Russ.
Bruno Wolff III <bruno@wolff.to> writes: > On Sat, Feb 19, 2005 at 12:07:12 -0200, > Jon Lapham <lapham@jandr.org> wrote: > > > > SELECT a.*, b.*, c.*, SUM(d.blah) > > FROM a, b, c, d > > WHERE <some join conditions> > > GROUP BY a.*, b.*, c.* > > > > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the > > explicit column names of all the column in a, b, and c. > > > > This becomes a maintenance nightmare as you add/drop column in these > > tables... > > Don't those tables have primary keys? Grouping by the primay key of each > table will produce the same result set as grouping by all of the columns. Actually it would be kind of nice to have this as a feature. Or mysql's feature of treating any unnamed columns as something like DISTINCT ON. However there are a few approaches for dealing with it. None of which are perfect but if they match your needs they work well. In the query above you could turn SUM(d.blah) into a subquery expression. This works well as long as you don't have multiple aggregate queries on the same table. SELECT a.*,b.*,c.*, (SELECT sum(blah) FROM d WHERE ...) AS d_sum FROM a,b,c This doesn't require a GROUP BY step which means it'll probably be faster. On the other hand it effectively forces a nested loop scan on d which is not necessarily the fastest. And if you have multiple aggregates postgres it forces separate lookups for the same data. It would be nice to have some feature for breaking out subquery expressions that return multiple rows into multiple output columns. Something like: SELECT a.*,b.*,c.*, (SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg) FROM a,b,c You could also turn the above into a more complex join like: SELECT * FROM a,b,c, (SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d WHERE ... AND c.groupname = d.groupname This works well as long as you didn't have the aggregate function applying to overlapping subsets of d before. (eg, it won't work for sum(product.price) if multiple invoices can contain the same product). alternatively you can do something like SELECT * FROM a,b,c, (select a.id as a_id, b.id as b_id, c.id as c_id, sum(blah) as d_sum from a,b,c,d where ... group by a.id,b.id,c.id ) AS sub WHERE a.id = a_id AND b.id = b_id AND c.id = c_id But that's pretty silly and not usually necessary. -- greg
On Sat, Feb 19, 2005 at 14:02:34 -0500, Oisin Glynn <me@oisinglynn.com> wrote: > > But the where clause defines the result of the aggregate function (in this > case the SUM)? Not really. > Is the only reason for needing the GROUP BY CLAUSE is because the aggregate > function demands it? Note that there is also a join to a table d. So that values in d are being summed up based on some connection from d to the other 3 tables. > If so could something like the following work where we pass the where clause > conditions into the function and it performs the aggregate function and > returns.. I am guessing this would be extremely inefficient? > > select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c, > where some conditions; If that function did a select from d, you could make this work, but it would likely be much slower than doing it in one SQL statement.
This is a very NEWBIE suggestion. I am fully prepared to be laughed out of town... But the where clause defines the result of the aggregate function (in this case the SUM)? Is the only reason for needing the GROUP BY CLAUSE is because the aggregate function demands it? If so could something like the following work where we pass the where clause conditions into the function and it performs the aggregate function and returns.. I am guessing this would be extremely inefficient? select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c, where some conditions; -- Warning complete gibberish pseudo code now follows function my_cheating_sum(a.id,b.id,c.id) select SUM(xxx) from a,b,c where some conditions; end function; ----- Original Message ----- From: "Bruno Wolff III" <bruno@wolff.to> To: "Jon Lapham" <lapham@jandr.org> Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org> Sent: Saturday, February 19, 2005 13:36 Subject: Re: [GENERAL] Group By and wildcards... > On Sat, Feb 19, 2005 at 15:59:52 -0200, > Jon Lapham <lapham@jandr.org> wrote: > > > > Since I do not want to have to re-write all my aggregate function > > containing queries upon modifications to the table definitions (and I do > > not want to write multi-thousand character long SELECT statements), > > maybe it is easier to use a temp table intermediary? > > > > SELECT a.id AS aid, SUM(d.blah) AS sum_blah > > INTO TEMPORARY TABLE foo > > FROM a, b, c, d > > WHERE <some join conditions linking a,b,c,d> > > > > followed by > > > > SELECT * > > FROM a, b, c, foo > > WHERE <some join conditions linking a,b,c> > > AND foo.aid=a.id > > > > Ugly... ugly... any other ideas on how to do this? My table definitions > > LITERALLY have hundreds of columns, and I need access to them all. > > Well if you are thinking about the above than you might be interested in > seeing a more sketched out example of what I was suggesting in my > followup after Tom's correction. > > SELECT a.*, b.*, c.*, e.d1 > FROM a, b, c, > (SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1 > FROM a, b, c, d > WHERE <some join conditions linking a,b,c,d> > GROUP BY a1, b1, c1) AS e > WHERE > a.id = e.a1 AND > b.id = e.b1 AND > c.id = e.c1 > ; > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Even if the function did a select from d, it could still have plenty of duplicates. To remove that possibility you would have to use the distinct clause which is also generally less efficient then a group by. "Bruno Wolff III" <bruno@wolff.to> wrote in message news:20050219192033.GA24244@wolff.to... > On Sat, Feb 19, 2005 at 14:02:34 -0500, > Oisin Glynn <me@oisinglynn.com> wrote: > > > > But the where clause defines the result of the aggregate function (in this > > case the SUM)? > > Not really. > > > Is the only reason for needing the GROUP BY CLAUSE is because the aggregate > > function demands it? > > Note that there is also a join to a table d. So that values in d are > being summed up based on some connection from d to the other 3 tables. > > > If so could something like the following work where we pass the where clause > > conditions into the function and it performs the aggregate function and > > returns.. I am guessing this would be extremely inefficient? > > > > select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c, > > where some conditions; > > If that function did a select from d, you could make this work, but it > would likely be much slower than doing it in one SQL statement. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >