Thread: Why does "group by" need to match select fields?
Sorry if this isn't exactly postgresql specific. I periodically run into this problem, and I'm running into it now. I'm wondering if there's something about "group by" that I don't understand. As an example what I'd want to do is return the "id" value for the check to each payee that has the highest amount. It seems like there's no problem with ambiguity in logic, but postgresql + other sql servers balk at it. The group by fields need to explicitly match the select fields with the exception of the aggregate function(s?). create table checks { id serial, payee text, amount double }; select max(amount), payee, id from checks group by payee; Why won't the above work? Is there another way to get the id for the record with the highest amount for each payee? Thanks.
Omar Eljumaily <omar2@omnicode.com> wrote: > > Sorry if this isn't exactly postgresql specific. I periodically run > into this problem, and I'm running into it now. I'm wondering if > there's something about "group by" that I don't understand. As an > example what I'd want to do is return the "id" value for the check to > each payee that has the highest amount. It seems like there's no > problem with ambiguity in logic, but postgresql + other sql servers balk > at it. The group by fields need to explicitly match the select fields > with the exception of the aggregate function(s?). > > create table checks > { > id serial, > payee text, > amount double > }; > > select max(amount), payee, id from checks group by payee; > > Why won't the above work? Is there another way to get the id for the > record with the highest amount for each payee? Because it's ambiguous. If you're grabbing max() for amount, which id tuple do you want? Perhaps the way you're storing your data, those answers aren't ambiguous, but the database doesn't know that. Take this query as an example: select max(amount), max(checknumber), payee from checks group by payee; In that case, the highest checknumber and the highest check amount probably won't come from the same tuple. If you were to throw in there: select max(amount), max(checknumber), payee, id from checks group by payee; Which id does it give you? The one that matches max(amount) or the one that matches max(checknumber)? -- Bill Moran Collaborative Fusion Inc.
OK, I see what's going on. I can have more than one max(amount) with the same amount and payee. Thanks so much. Like I said, it's sort of dogged me off and on many times. Thanks. Bill Moran wrote: > Omar Eljumaily <omar2@omnicode.com> wrote: > >> Sorry if this isn't exactly postgresql specific. I periodically run >> into this problem, and I'm running into it now. I'm wondering if >> there's something about "group by" that I don't understand. As an >> example what I'd want to do is return the "id" value for the check to >> each payee that has the highest amount. It seems like there's no >> problem with ambiguity in logic, but postgresql + other sql servers balk >> at it. The group by fields need to explicitly match the select fields >> with the exception of the aggregate function(s?). >> >> create table checks >> { >> id serial, >> payee text, >> amount double >> }; >> >> select max(amount), payee, id from checks group by payee; >> >> Why won't the above work? Is there another way to get the id for the >> record with the highest amount for each payee? >> > > Because it's ambiguous. If you're grabbing max() for amount, which > id tuple do you want? > > Perhaps the way you're storing your data, those answers aren't ambiguous, > but the database doesn't know that. Take this query as an example: > > select max(amount), max(checknumber), payee from checks group by payee; > > In that case, the highest checknumber and the highest check amount > probably won't come from the same tuple. If you were to throw in > there: > > select max(amount), max(checknumber), payee, id from checks group by payee; > > Which id does it give you? The one that matches max(amount) or the one > that matches max(checknumber)? > >
On Feb 28, 2007, at 6:57 PM, Omar Eljumaily wrote:
OK, I see what's going on. I can have more than one max(amount) with the same amount and payee. Thanks so much. Like I said, it's sort of dogged me off and on many times.Thanks.Bill Moran wrote:Omar Eljumaily <omar2@omnicode.com> wrote:
Sorry if this isn't exactly postgresql specific. I periodically run into this problem, and I'm running into it now. I'm wondering if there's something about "group by" that I don't understand. As an example what I'd want to do is return the "id" value for the check to each payee that has the highest amount. It seems like there's no problem with ambiguity in logic, but postgresql + other sql servers balk at it. The group by fields need to explicitly match the select fields with the exception of the aggregate function(s?).create table checks{id serial,payee text,amount double};select max(amount), payee, id from checks group by payee;Why won't the above work? Is there another way to get the id for the record with the highest amount for each payee?
Because it's ambiguous. If you're grabbing max() for amount, whichid tuple do you want?Perhaps the way you're storing your data, those answers aren't ambiguous,but the database doesn't know that. Take this query as an example:select max(amount), max(checknumber), payee from checks group by payee;In that case, the highest checknumber and the highest check amountprobably won't come from the same tuple. If you were to throw inthere:select max(amount), max(checknumber), payee, id from checks group by payee;Which id does it give you? The one that matches max(amount) or the onethat matches max(checknumber)?
Omar, note that in many cases you can do:
select c.amount, c.payee, c.id
from checks c
where c.amount = (select max(amount)
from checks
where payee=c.payee);
erik jones <erik@myemma.com>
sofware developer
615-296-0838
emma(r)
On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote: > OK, I see what's going on. I can have more than one max(amount) with > the same amount and payee. Thanks so much. Like I said, it's sort of > dogged me off and on many times. Note that in the special case of max/min, you can use things like ORDER BY/LIMIT to acheive the effect you want. And DISTINCT ON () is also useful in such situations. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote: > OK, I see what's going on. I can have more than one max(amount) with > the same amount and payee. Thanks so much. Like I said, it's sort of > dogged me off and on many times. Note that in the case of min/max you can use ORDER BY/LIMIT or DISTINCT ON () to acheive the effect you want. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Wed, Feb 28, 2007 at 16:19:02 -0800, Omar Eljumaily <omar2@omnicode.com> wrote: > select max(amount), payee, id from checks group by payee; > > Why won't the above work? Is there another way to get the id for the > record with the highest amount for each payee? While the DISTINCT ON approach is probably best if you can live with a Postgres specific solution, the general way to do this is use the group by query to get a set of primary keys with aggregates and then you join this back to the original table to get the other data. Some databases will also recognize that you are grouping by a candidate key and allow you to specify normal columns since they must all have the same value for rows with the same candidate key value. Unfortunately Postgres doesn't do that now.