Nulls, arrays, records, IS NULL, IS DISTINCT FROM - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Nulls, arrays, records, IS NULL, IS DISTINCT FROM |
Date | |
Msg-id | 17311.1159548799@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Nulls, arrays, records, IS NULL, IS DISTINCT FROM
|
List | pgsql-hackers |
Following up yesterday's discussion, I've been studying the SQL spec for <null predicate> and <distinct predicate>, and it seems a bit inconsistent. The rules for <distinct predicate> make it clear that you are supposed to "drill down" into row and array values to determine distinctness. SQL99 has a) If the declared type of X or Y is an array type, then "X IS DISTINCT FROM Y" is effectively computedas follows: i) Let NX be the number of elements in X; let NY be the number of elements in Y. ii) Let EX(i) be the i-th element of X; let EY(i) be the i-th element of Y. iii) Case: 1) If NX is not equal to NY, then "X IS DISTINCT FROM Y" is true. 2) If NX equals zero and NY equals zero, then "X IS DISTINCT FROM Y" is false. 3) If "EX(i) IS DISTINCT FROM EY(i)" is false for all i between 1 (one) and NX, then "XIS DISTINCT FROM Y" is false. 4) Otherwise, "X IS DISTINCT FROM Y" is true. SQL2003 has completely rewritten the text but the meaning seems the same. I suppose we want to generalize the NX/NY business to say "if the array bounds are not identical then the arrays are distinct". We are clearly getting this wrong since the introduction of nulls in arrays, but I'll go fix that. Similarly, given two row expressions, distinctness is determined field-wise: X and Y are distinct if any two corresponding fields are distinct. We are currently getting this correct only for the case of parse-time ROW expressions, ieROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz) This is pretty much analogous to the case Teodor noted yesterday for IS NULL: it's not being done in gram.y but it's still being done much too early. We need to be able to do it in the executor to handle situations where a row value is coming from a function or some other source that's not disassemblable at parse time. What's bothering me is that for "foo IS [NOT] NULL", the spec clearly prescribes drilling down into a rowtype value to examine the individual fields, but I can't find any language that prescribes the same for arrays. Is this intentional, or an oversight? In particular, the spec saysROW(1,2,NULL) IS NOT NULL is false, because the row fields must be *all* not null to make it true. But it's very unclear whetherARRAY[1,2,NULL] IS NOT NULL should be false on the same reasoning. Right now, we respond "true" on the grounds that the array object as-a-whole isn't null, without examining its contents. Comments? Does anyone see any guidance in the spec? If there is none, which behavior do we think is most useful/consistent? regards, tom lane
pgsql-hackers by date: