Home > mailing lists

Nulls, arrays, records, IS NULL, IS DISTINCT FROM - Mailing list pgsql-hackers

From	Tom Lane
Subject	Nulls, arrays, records, IS NULL, IS DISTINCT FROM
Date	September 29, 2006 13:53:25
Msg-id	17311.1159548799@sss.pgh.pa.us Whole thread Raw
Responses	Re: Nulls, arrays, records, IS NULL, IS DISTINCT FROM
List	pgsql-hackers

Tree view

Following up yesterday's discussion, I've been studying the SQL spec for
<null predicate> and <distinct predicate>, and it seems a bit
inconsistent.

The rules for <distinct predicate> make it clear that you are supposed
to "drill down" into row and array values to determine distinctness.
SQL99 has
a) If the declared type of X or Y is an array type, then "X IS DISTINCT FROM Y" is effectively
computedas follows:

i) Let NX be the number of elements in X; let NY be the number of elements in Y.
ii) Let EX(i) be the i-th element of X; let EY(i) be the i-th element of Y.
iii) Case:
1) If NX is not equal to NY, then "X IS DISTINCT FROM Y" is true.
2) If NX equals zero and NY equals zero, then "X IS DISTINCT FROM Y" is false.
3) If "EX(i) IS DISTINCT FROM EY(i)" is false for all i between 1 (one) and NX, then
"XIS DISTINCT FROM Y" is false.

4) Otherwise, "X IS DISTINCT FROM Y" is true.

SQL2003 has completely rewritten the text but the meaning seems the
same. I suppose we want to generalize the NX/NY business to say
"if the array bounds are not identical then the arrays are distinct".
We are clearly getting this wrong since the introduction of nulls in
arrays, but I'll go fix that.

Similarly, given two row expressions, distinctness is determined
field-wise: X and Y are distinct if any two corresponding fields
are distinct. We are currently getting this correct only for
the case of parse-time ROW expressions, ieROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz)
This is pretty much analogous to the case Teodor noted yesterday
for IS NULL: it's not being done in gram.y but it's still being
done much too early. We need to be able to do it in the executor
to handle situations where a row value is coming from a function
or some other source that's not disassemblable at parse time.

What's bothering me is that for "foo IS [NOT] NULL", the spec clearly
prescribes drilling down into a rowtype value to examine the individual
fields, but I can't find any language that prescribes the same for
arrays. Is this intentional, or an oversight? In particular, the
spec saysROW(1,2,NULL) IS NOT NULL
is false, because the row fields must be *all* not null to make it true.
But it's very unclear whetherARRAY[1,2,NULL] IS NOT NULL
should be false on the same reasoning. Right now, we respond "true" on
the grounds that the array object as-a-whole isn't null, without
examining its contents.

Comments? Does anyone see any guidance in the spec? If there is none,
which behavior do we think is most useful/consistent?
regards, tom lane

pgsql-hackers by date:

From: Andreas Pflug
Date: 29 September 2006, 13:49:30
Subject: Re: Backup and restore through JDBC

From: "Joshua D. Drake"
Date: 29 September 2006, 13:56:58
Subject: Re: Win32 hard crash problem

Nulls, arrays, records, IS NULL, IS DISTINCT FROM - Mailing list pgsql-hackers

Previous

Next