Fixing the representation of ORDER BY/GROUP BY/DISTINCT - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Fixing the representation of ORDER BY/GROUP BY/DISTINCT |
Date | |
Msg-id | 25811.1217552089@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Fixing the representation of ORDER BY/GROUP BY/DISTINCT
Re: Fixing the representation of ORDER BY/GROUP BY/DISTINCT Re: Fixing the representation of ORDER BY/GROUP BY/DISTINCT |
List | pgsql-hackers |
So while I was fooling with Steve Midgley's problem I got a bit of a bee in my bonnet about the way that the parser emits ORDER BY, GROUP BY, and DISTINCT lists. * Currently, ORDER BY and DISTINCT use lists of SortClause, while GROUP BY is a list of GroupClause --- but these are actually the same struct, and there's a fair amount of code that relies on that. The advantage of them being the same is that it's easy to compare them when considering sort-and-uniq-style grouping plans. Except it's not easy enough: I tried to use a list_member test in one place, and of course it didn't work because equal() never sees distinct node tags as equal. So I'm thinking we ought to unify the two node types logically as well as physically, and just have one node type (SortGroupClause, maybe). * The current representation is fine for ORDER BY but leaves something to be desired for GROUP BY and DISTINCT: there isn't anyplace to specify the equality operator for a hash-based grouping operation. This results in repeat lookups in the planner to fetch information that was readily available at parse time. But what's worse IMHO is that we simply cannot represent a grouping query for a datatype that hasn't got a btree sort opclass --- even though we could implement it, by means of hashing, if the type has a hashable equality operator. (This isn't academic: it's easy to imagine datatypes that have equality but no natural linear sort order. A practical example is XID, which in fact has a hash opclass but not a btree opclass, because it violates the law of transitivity.) So what I'm thinking we want is something like typedef struct SortGroupClause { NodeTag type; Index tleSortGroupRef; /* reference into targetlist */ Oid eqop; /* the equality operator ('=' op) */ Oid sortop; /* the ordering operator ('<' op), or 0 */ bool nulls_first; /* do NULLs come before normal values? */ } SortGroupClause; In an ORDER BY item the sortop and nulls_first flag *must* be supplied. The eqop isn't really useful for ORDER BY, but it's trivial to get when we get the sortop, and always including it simplifies comparisons to GROUP/DISTINCT items. In GROUP BY/DISTINCT items, the eqop *must* be supplied, and if it is a btree equality operator then the associated sortop should be given as well. We'd continue the current practice of duplicating the ordering properties of any ORDER BY clause given for the same targetlist item, so that compatible ordering and grouping items are just equal(). * Another thing I've gotten tired of is the difficulty of telling DISTINCT from DISTINCT ON in the parsed representation. Surely we can afford to stick another bool field into Query to make that distinction unambiguous. This is important for making the world safe for hashed DISTINCT, since AFAICS we probably can't ever use hashing for DISTINCT ON --- its definition is too dependent on the assumption of sorting. Barring objections, I'm going to go off and make this happen. It won't immediately result in supporting hashed DISTINCT, but it's another necessary step on the road to that. regards, tom lane
pgsql-hackers by date: