IN, BETWEEN, spec compliance, and odd operator names - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | IN, BETWEEN, spec compliance, and odd operator names |
Date | |
Msg-id | 655.1219627463@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: IN, BETWEEN, spec compliance, and odd operator names
Re: IN, BETWEEN, spec compliance, and odd operator names |
List | pgsql-hackers |
I was looking just now at gram.y's handling of various peculiar SQL constructs, and was reminded of a point that's bothered me before, but I don't recall if it's ever been discussed explicitly on -hackers. As an example, take the production for BETWEEN ASYMMETRIC: a_expr BETWEEN opt_asymmetric b_expr AND b_expr { $$ = (Node *) makeA_Expr(AEXPR_AND,NIL, (Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $4, @2), (Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $6, @2), @2); } Okay, this is a pretty direct implementation of how SQL99 defines the construct: 8.3 <between predicate> syntax rule 6 saith "X BETWEEN ASYMMETRIC Y AND Z" is equivalent to "X>=Y AND X<=Z". But it leaves me feeling dissatisfied. What if the datatype has standard comparison operators (as identified by a default btree opclass) but they're not named ">=" and "<=" ? Perhaps more plausibly, what if those operators exist but aren't in the current search path? The production for NOT BETWEEN is even more troubling: a_expr NOT BETWEEN opt_asymmetric b_expr AND b_expr { $$ = (Node *) makeA_Expr(AEXPR_OR,NIL, (Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $5, @2), (Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $7, @2), @2); } I can't object too much to the hardwired application of DeMorgan's law (NOT (A AND B) => (NOT A) OR (NOT B)) but what this also has is a hardwired assumption that "<" and ">" exist and are the negators of ">=" and "<=" respectively. Probably true, but let's see you find chapter and verse in the SQL spec to support that... Seems to me that what this boils down to is whether we want to read the spec literally ("it says the construct is defined in terms of operators named >= and <=, therefore we should do that") or by intent (obviously what they *want* is a construct that behaves sensibly in terms of the datatype's semantics). We are more than a bit schizophrenic on this point --- in different parts of the system you can find these things being done both ways. There is plenty of code that insists on finding a default btree opclass to define notions of "less" or "greater"; but we have these purely name-based transformations in gram.y, and I think there are some other parts of the parser that do similar things. I'm not particularly eager to start changing things in this area right now, but it seems to me that it'd be a good idea to establish a project policy about what we consider to be the preferred behavior, with an eye to eventually migrating the parts of the system that don't conform. My own feeling is that we should avoid imputing particular semantics to particular operator names, and so these constructs should always be defined by reference to operators found in a default opclass for the datatype, rather than by specific operator names. However, that way will likely take more code and cycles to implement than purely name-based definitions; and there is also the argument that it violates the in-so-many-words definitions given by the spec. Comments? regards, tom lane PS: there are some other issues here, like whether BETWEEN should be allowed to cause double evaluation of its left-hand argument, and whether we wouldn't like it to get reverse-listed by ruleutils.c in the original BETWEEN format rather than as an expanded version. However, what I'd like to focus on in this particular thread is the narrow issue of defining the constructs in terms of operator names vs operator semantics.
pgsql-hackers by date: