Re: Warts with SELECT DISTINCT - Mailing list pgsql-hackers

From Bruno Wolff III
Subject Re: Warts with SELECT DISTINCT
Date
Msg-id 20060504140611.GA19321@wolff.to
Whole thread Raw
In response to Re: Warts with SELECT DISTINCT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Warts with SELECT DISTINCT
List pgsql-hackers
On Thu, May 04, 2006 at 02:39:33 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> > ... it would be OK to rewrite
> > SELECT DISTINCT x ORDER BY foo(x)
> > as
> > SELECT DISTINCT ON (foo(x), x) x ORDER BY foo(x)
> 
> This assumes that x = y implies foo(x) = foo(y), which is something
> that's not necessarily the case, mainly because a datatype's "="
> function need not have a lot to do with the behavior of arbitrary
> functions foo(), especially if foo() yields a different datatype.
> The citext datatype is an easy counterexample: it thinks "foo" = "Foo",
> but md5() of those values will not yield the same answers.
> 
> The bottom line here is that this sort of deduction requires more
> understanding of the properties of datatypes and functions than
> our existing catalogs allow the planner to obtain.

Thanks for pointing that out. I should have realized that this was the same
(or at least close to) issue I was thinking would be a problem initially, but
then I started thinking that '=' promised more than it did and assumed that
x = y implies foo(x) = foo(y), which as you point out isn't always true.


pgsql-hackers by date:

Previous
From: "Larry Rosenman"
Date:
Subject: autovacuum logging, part deux.
Next
From: Tom Lane
Date:
Subject: Re: Rethinking locking for database create/drop vs connection startup