Thread: aliases, &c in HAVING clause?
Kind people, I bumped across this several times, and am wondering what SQL99 and SQL200x have to say about column numbers or aliases in HAVING. SQL92 is fairly clear (no) but also somewhat out of date. Here's a scenario: SELECT foo_name, count(*) as foo_count FROM foo GROUP BY foo_name HAVING foo_count > 2 ORDER BY foo_count DESC, foo_name; Another: SELECT foo_name, count(*) as foo_count FROM foo GROUP BY foo_name HAVING 2 > 2 -- OK, so this is a little weird, but it makes sense in context. ORDER BY 2 DESC, 1; I know pg throws some kind of parse error when I ask for that sort of thing, and that replacing the alias/number with its referent clears this up, but what do later standards have to say about doing or not doing the above? If they're ambiguous, are there good reasons why the above shouldn't work? TIA for any pointers on this :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Transported to a surreal landscape, a young girl kills the first woman she meets and then teams up with three complete strangers to kill again. Marin County newspaper's TV listing for The Wizard of Oz
david@fetter.org (David Fetter) writes: > I bumped across this several times, and am wondering what SQL99 and > SQL200x have to say about column numbers or aliases in HAVING. SQL99 not only does not allow them in GROUP BY or HAVING, but it doesn't allow them in ORDER BY either, thereby eliminating the entire wart from the language. I doubt that SQL200x will reverse field on this decision. In retrospect it was an error for us to allow aliases in GROUP BY, as this has caused so much confusion about where they are legal. If it weren't for backwards-compatibility concerns, I'd vote for adopting the SQL99 definition (no aliases in any of these clauses). > are there good reasons why the above shouldn't work? Well, you showed one: interpreting "2 > 2" as anything other than a constant expression is just plain weird. But the real reason why this is bogus is that it violates the fundamental conceptual model of how SELECT works. The SELECT output list is not supposed to be computed until after all the other steps are complete, and therefore it's improper to assume its results are available in GROUP BY or HAVING. regards, tom lane
On Mon, Feb 23, 2004 at 06:36:48PM -0500, Tom Lane wrote: > david@fetter.org (David Fetter) writes: > > I bumped across this several times, and am wondering what SQL99 > > and SQL200x have to say about column numbers or aliases in HAVING. > > SQL99 not only does not allow them in GROUP BY or HAVING, but it > doesn't allow them in ORDER BY either, thereby eliminating the > entire wart from the language. I doubt that SQL200x will reverse > field on this decision. Right. > In retrospect it was an error for us to allow aliases in GROUP BY, > as this has caused so much confusion about where they are legal. If > it weren't for backwards-compatibility concerns, I'd vote for > adopting the SQL99 definition (no aliases in any of these clauses). Hmm. If I were going to flog for the other side, it would be on the grounds of error prevention with aliases. Just in general, isn't it better to write a piece of code (here, a possibly-complicated aggregate) just once and refer to it elsewhere rather than have to write a separate copy of it everywhere it's used? That's one of the fundamental design principles on which structured programming, subroutines and objects are based. > > are there good reasons why the above shouldn't work? > > Well, you showed one: interpreting "2 > 2" as anything other than a > constant expression is just plain weird. Um, and the rest of SQL is...normal?!? ;) > But the real reason why this is bogus is that it violates the > fundamental conceptual model of how SELECT works. The SELECT output > list is not supposed to be computed until after all the other steps > are complete, and therefore it's improper to assume its results are > available in GROUP BY or HAVING. Maybe I'm missing something, but if I recall right, the "fundamental principle" of a SELECT is that it's a PROJECTion & RESTRICTion of the full cross-product of the relations in question. I haven't checked, but I'm pretty certain that PostgreSQL doesn't do things that way at the implementation level. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter <david@fetter.org> writes: > Just in general, isn't it better to write a piece of code (here, a > possibly-complicated aggregate) just once and refer to it elsewhere > rather than have to write a separate copy of it everywhere it's used? In general, you do that with subselects. Having inconsistent scoping rules for ORDER BY is a much less desirable way to attack it. SELECT * FROM (SELECT foobar(baz) AS x FROM ...) ssGROUP BY x HAVING x > 44; >> But the real reason why this is bogus is that it violates the >> fundamental conceptual model of how SELECT works. The SELECT output >> list is not supposed to be computed until after all the other steps >> are complete, and therefore it's improper to assume its results are >> available in GROUP BY or HAVING. > but I'm pretty certain that PostgreSQL doesn't do things that way at > the implementation level. It does anywhere that you can tell the difference. Try a SELECT with side-effect-producing output expressions. As an example, would you be happy if the following were prone to getting divide-by-zero errors? SELECT x, 1.0 / sum(x) FROM t GROUP BY x HAVING sum(x) != 0; regards, tom lane