Thread: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
Hello, Hackers! TODO list has an entry "Move NOT NULL constraint information to pg_constraint" with four links and without two with the newest work[1][2]. I rebased the patch from [2] (in attachment). At least it applies cleanly on top of c477e84fe2471cb675234fce75cd6bb4bc2cf481 and does not generate a core dump during "make check". There are no tests for it and it fails "make check" (by difference) which leads inability to run "make check-world". But before starting working on it I had a look at the SQL-2011 standard (ISO/IEC 9075-2)[3] and found that: 1. A name for a "NOT NULL" constraint <NNC> can be given by a table definition (subcl. 11.4, "Format"->"column constraint definition"). 2. The standard splits NNC and CHECK constraints (subcl. 11.4, "Format"-> "column constraint") 3. A _descriptor_ of a column must include a "nullability characteristic" (subcl. 11.4 GR 4.d and 4.L). 4. At the same time the _descriptor_ of the column must include an _indication_ whether the column is defined as "NOT NULL" or not (near subcl. 4.13 Note 41) and a name of the constraint. 5. "Nullability characteristic" is set only by nondeferrable constraints (subcl. 4.13) but there can be several constraints which have influence on it (e.g. PK + CHECK but without NNC => is_nullable). 6. If an SQL-implementation can deduce CHECK constraint and/or DOMAIN constraint to "column IS NULL" can never be TRUE it can claim support for the feature "T101. Enhanced nullability determination" (subcl. 4.13, Note 38). See also (subcl. 6.35 SR 4.b) 7. NNC can be "deferrable" via "constraint characteristics" which are set in addition to "column constraint" (i.e. the column can be "NOT NULL" _and_ not "is_nullable"). 8. NNC is an _equivalent_ to a table CHECK constraint (subcl. 11.4 SR 17.a). 9. There is no way to set NNC for a table except "... ALTER COLUMN ... SET NOT NULL" clause (subcl. 11.6). 10. "... ALTER COLUMN ... SET NOT NULL" clause doesn't allow to specify name of the constraint (subcl. 11.15). 11. There is no way to specify more than one NNC per column via "SET NOT NULL" (subcl. 11.15 GR 1) 12. At the same time in (subcl. 4.13) mentioned there can be "at least one NNC" (may be via inheritance?). 13. "... ALTER COLUMN ... SET NOT NULL" _must_ add a table CHECK constraint (subcl. 11.15 GR 1.d). 14. "DROP NOT NULL" clause must drop cascading all NNC, but leave other constraints that may affect "nullability characteristic" (subcl. 11.16 GR 1.*). 15. PK can have NULL values if its "constraint characteristics" is "deferrable". 16. There is no mention of "NOT NULL" constraints in the (ISO/IEC 9075-11) at all. === Shortcuts: subcl: Subclause GR: General Rule SR: Syntax Rule Conclusion: I. NNC implies CHECK constraints (p.13) but it is not a constraint itself (p.4, p.9, p.13, p.16; opposite to p.2) II. CHECK constraint does not imply NNC but it has influence on "attnotnull" (deep check of CHECK constraints allows to move the feature "T101" to the supporting features list). III. "pg_attribute" table should have an "attnotnullid oid" as an indicator of "NOT NULL" (p.4) and points to a CHECK constraint; It is in addition to a "Nullability characteristic" "attnotnull" (p.3). IV. "pg_constraint" should have a column "connotnullkey int2[]" as a "list of the nullable columns" which references to "pg_attribute.attnum" for fast checking whether a column is still nullable after deleting/updating constraints or not. Array is necessary for cases like "CHECK ((col1 IS NOT NULL) AND (col2 IS NOT NULL))" and for nondeferrable PKs. V. Inherited tables inherit CHECK constraints (from p.I), but that constraints are not written to the "attnotnullid" (from p.II) even if they have NULL values. VI. "pg_constraint" _can_ have a column "connotnullpure BOOLEAN" to skip CHECK constraints which define "NOT NULL" only (for one or several columns) because a row has already checked for NULLs via "attnotnull" just before ExecRelCheck is executed. VII. "connotnullkey" is NULL for deferrable and "NOT VALID" constraints. VIII. "connotnullkey" is recalculated after "VALIDATE CONSTRAINT" is done. IX. "attnotnull" is recalculated if a constraint with nonempty "connotnullkey" is inserted, deleted or "connotnullkey" is changed. X. Pure CHECK constraint doesn't do full scan if the appropriate table's column(s) has(ve) "attnotnull" as "TRUE". XI. pg_dump shows "NOT NULL" iff "attnotnullid IS NOT NULL" and skip CHECK statement with oid matched with attnotnullid. What do you think about that design? P.S.: Since the SQL standard defines that "col NOT NULL" as an equivalent to "CHECK (col IS NOT NULL)" (p.8) what to do with that behavior: postgres=# create type t as (x int); CREATE TYPE postgres=# SELECT v, v IS NOT NULL AS should_be_in_table FROM (VALUES('(1)'::t),('()'),(NULL)) AS x(v); v | should_be_in_table -----+-------------------- (1) | t () | f | f (3 rows) "attnotnull" in such case is stricter, like "CHECK (col IS DISTINCT FROM NULL)". Should such values (with NULL in each attribute of a composite type) violate NOT NULL constraints? === [1]http://www.postgresql.org/message-id/flat/1343682669-sup-2532@alvh.no-ip.org [2]http://www.postgresql.org/message-id/20160109030002.GA671800@alvherre.pgsql [3]http://www.wiscorp.com/sql20nn.zip
Attachment
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
Vitaly Burovoy
Date:
On 2/7/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > Hello, Hackers! > > TODO list has an entry "Move NOT NULL constraint information to > pg_constraint" with four links and without two with the newest > work[1][2]. > > I rebased the patch from [2] (in attachment). At least it applies > cleanly on top of c477e84fe2471cb675234fce75cd6bb4bc2cf481 and does > not generate a core dump during "make check". There are no tests for > it and it fails "make check" (by difference) which leads inability to > run "make check-world". It seems the file I attached has more than necessary changes. Please, find a correct patch attached. > === > [1]http://www.postgresql.org/message-id/flat/1343682669-sup-2532@alvh.no-ip.org > [2]http://www.postgresql.org/message-id/20160109030002.GA671800@alvherre.pgsql -- Best regards, Vitaly Burovoy
Attachment
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
Alvaro Herrera
Date:
Vitaly Burovoy wrote: Hi, > But before starting working on it I had a look at the SQL-2011 > standard (ISO/IEC 9075-2)[3] and found that: > > 1. A name for a "NOT NULL" constraint <NNC> can be given by a table > definition (subcl. 11.4, "Format"->"column constraint definition"). > 2. The standard splits NNC and CHECK constraints (subcl. 11.4, > "Format"-> "column constraint") Point 2 is where things differ from what I remember; my (possibly flawed) understanding was that there's no difference between those things. Many (maybe all) of the things from this point on are probably fallout from that one change. > III. "pg_attribute" table should have an "attnotnullid oid" as an > indicator of "NOT NULL" (p.4) and points to a CHECK constraint; It is > in addition to a "Nullability characteristic" "attnotnull" (p.3). > IV. "pg_constraint" should have a column "connotnullkey int2[]" as a > "list of the nullable columns" which references to > "pg_attribute.attnum" for fast checking whether a column is still > nullable after deleting/updating constraints or not. Array is > necessary for cases like "CHECK ((col1 IS NOT NULL) AND (col2 IS NOT > NULL))" and for nondeferrable PKs. I think these points warrant some more consideration. I don't like the idea that pg_attribute and pg_constraint are both getting considerably bloated to support this. > P.S.: > Since the SQL standard defines that "col NOT NULL" as an equivalent to > "CHECK (col IS NOT NULL)" (p.8) what to do with that behavior: > > postgres=# create type t as (x int); > CREATE TYPE > postgres=# SELECT v, v IS NOT NULL AS should_be_in_table FROM > (VALUES('(1)'::t),('()'),(NULL)) AS x(v); > v | should_be_in_table > -----+-------------------- > (1) | t > () | f > | f > (3 rows) > > "attnotnull" in such case is stricter, like "CHECK (col IS DISTINCT FROM NULL)". > > Should such values (with NULL in each attribute of a composite type) > violate NOT NULL constraints? I wonder if the standard has a concept of null composite values. If not, then there is no difference between IS NOT NULL and IS DISTINCT FROM NULL, which explains why they define NNC in terms of the former. I think your email was too hard to read because of excessive density, which would explain the complete lack of response. I haven't had the chance to work on this topic again, but I encourage you to, if you have the resources. (TBH I haven't had the chance to study your proposed design in detail, either). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
Vitaly Burovoy
Date:
I'm sorry for the late answer. On 4/27/16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Vitaly Burovoy wrote: > > Hi, > >> But before starting working on it I had a look at the SQL-2011 >> standard (ISO/IEC 9075-2)[3] and found that: >> >> 1. A name for a "NOT NULL" constraint <NNC> can be given by a table >> definition (subcl. 11.4, "Format"->"column constraint definition"). >> 2. The standard splits NNC and CHECK constraints (subcl. 11.4, >> "Format"-> "column constraint") > > Point 2 is where things differ from what I remember; my (possibly > flawed) understanding was that there's no difference between those > things. Many (maybe all) of the things from this point on are probably > fallout from that one change. It is just mentioning that CHECK constraints have influence on nullability characteristic, but it differs from NNC. NNC creates CHECK constraint, but not vice versa. You can create several CHECK "col IS NOT NULL" constraints, but only one NNC (several ones by inheritance only?). And DROP NOT NULL should drop only those CHECK that is linked with NNC (and inherited), but no more (full explanation is in my initial letter). >> III. "pg_attribute" table should have an "attnotnullid oid" as an >> indicator of "NOT NULL" (p.4) and points to a CHECK constraint; It is >> in addition to a "Nullability characteristic" "attnotnull" (p.3). >> IV. "pg_constraint" should have a column "connotnullkey int2[]" as a >> "list of the nullable columns" which references to >> "pg_attribute.attnum" for fast checking whether a column is still >> nullable after deleting/updating constraints or not. Array is >> necessary for cases like "CHECK ((col1 IS NOT NULL) AND (col2 IS NOT >> NULL))" and for nondeferrable PKs. > > I think these points warrant some more consideration. I don't like the > idea that pg_attribute and pg_constraint are both getting considerably > bloated to support this. Ok, I'm ready for a discussion. Two additional columns are necessary: one for pointing to an underlying CHECK constraint (or boolean column whether current CHECK is NNC or not) and second for fast computation of "attnotnull" (which means "nullable characteristic") and ability to skip check if "attnotnull" is set but not triggered (I think it'll improve performance for inherited tables). I think placing the first column (attnotnullid) to pg_attribute is better because you can't have more than one value in it. The second is obviously should be placed in pg_constraint. >> P.S.: >> Since the SQL standard defines that "col NOT NULL" as an equivalent to >> "CHECK (col IS NOT NULL)" (p.8) what to do with that behavior: >> >> postgres=# create type t as (x int); >> CREATE TYPE >> postgres=# SELECT v, v IS NOT NULL AS should_be_in_table FROM >> (VALUES('(1)'::t),('()'),(NULL)) AS x(v); >> v | should_be_in_table >> -----+-------------------- >> (1) | t >> () | f >> | f >> (3 rows) >> >> "attnotnull" in such case is stricter, like "CHECK (col IS DISTINCT FROM >> NULL)". >> >> Should such values (with NULL in each attribute of a composite type) >> violate NOT NULL constraints? > > I wonder if the standard has a concept of null composite values. If > not, then there is no difference between IS NOT NULL and IS DISTINCT > FROM NULL, which explains why they define NNC in terms of the former. Yes, it has. The PG's composite type is "Row types" (subcl.4.8) in the standard. The standard also differentiates IS [NOT] NULL and IS [NOT] DISTINCT FROM: >>> Subcl. 8.8 <null predicate>: >>> ... >>> 1) Let R be the <row value predicand> and let V be the value of R. >>> 2) Case: >>> a) If V is the null value, then “R IS NULL” is True and >>> the value of “R IS NOT NULL” is False. >>> b) Otherwise: >>> i) The value of “R IS NULL” is >>> Case: >>> 1) If the value of every field of V is the null value, then True. >>> 2) Otherwise, False. >>> ... >>> >>> Subcl. 8.15 <distinct predicate> >>> ... >>> 1) Let V1 be the value of <row value predicand 3> and let V2 be the value of <row value predicand 4>. >>> ... >>> b) If V1 is the null value and V2 is not the null value, or if V1 is not the null value and V2 is the null >>> value, then the result is True. >>> ... In subcl.8.8 "each column" is mentioned, in 8.15 if one of value is the null value and the other is not then nothing more is checked and True is returned. > I think your email was too hard to read because of excessive density, > which would explain the complete lack of response. Hmm. I decided it was "silently approved". =) > I haven't had the chance to work on this topic again, but I encourage you to, > if you have the resources. Thank you, I think I'll find a time for it no earlier than the summer. > (TBH I haven't had the chance to study your proposed design in detail, either). I hope somebody find a time to study it before someone sends a proposal. -- Best regards, Vitaly Burovoy
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: > On 4/27/16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Point 2 is where things differ from what I remember; my (possibly >> flawed) understanding was that there's no difference between those >> things. Many (maybe all) of the things from this point on are probably >> fallout from that one change. > It is just mentioning that CHECK constraints have influence on > nullability characteristic, but it differs from NNC. > NNC creates CHECK constraint, but not vice versa. You can create > several CHECK "col IS NOT NULL" constraints, but only one NNC (several > ones by inheritance only?). And DROP NOT NULL should drop only those > CHECK that is linked with NNC (and inherited), but no more (full > explanation is in my initial letter). This seems to me to be a most curious reading of the standard. SQL:2011 11.4 <column definition> syntax rule 17a says If a <column constraint definition> is specified that contains the <column constraint> NOT NULL, then it is equivalent tothe following <table constraint definition>: CND CHECK ( C IS NOT NULL ) CA As a rule, when the SQL spec says "equivalent", they do not mean "it's sort of like this", they mean the effects are indistinguishable. In particular, I see nothing whatsoever saying that you're not allowed to write more than one per column. So I don't like the proposal to add an attnotnullid column to pg_attribute. What we'd talked about earlier was converting attnotnull into, effectively, a hint flag saying that there's at least one NOT NULL constraint attached to the column. That still seems like a good approach to me. When we're actually ready to throw an error for a null value, we could root through the table's constraint list for a not-null constraint name to report. It doesn't matter which one we select, because constraint application order has never been promised to be deterministic; and a few extra cycles at that point don't seem like a big problem to me. regards, tom lane
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
"David G. Johnston"
Date:
Quick flyby here...
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> On 4/27/16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Point 2 is where things differ from what I remember; my (possibly
>> flawed) understanding was that there's no difference between those
>> things. Many (maybe all) of the things from this point on are probably
>> fallout from that one change.
> It is just mentioning that CHECK constraints have influence on
> nullability characteristic, but it differs from NNC.
> NNC creates CHECK constraint, but not vice versa. You can create
> several CHECK "col IS NOT NULL" constraints, but only one NNC (several
> ones by inheritance only?). And DROP NOT NULL should drop only those
> CHECK that is linked with NNC (and inherited), but no more (full
> explanation is in my initial letter).
Either it's one, or it's not...
This seems to me to be a most curious reading of the standard.
SQL:2011 11.4 <column definition> syntax rule 17a says
If a <column constraint definition> is specified that contains
the <column constraint> NOT NULL, then it is equivalent to the
following <table constraint definition>:
CND CHECK ( C IS NOT NULL ) CA
As a rule, when the SQL spec says "equivalent", they do not mean "it's
sort of like this", they mean the effects are indistinguishable. In
particular, I see nothing whatsoever saying that you're not allowed to
write more than one per column.
Does it define how DROP NOT NULL is supposed to behave?
I agree that the behavior of a column NNC is identical to a similar constraint defined on the table: but if drop not null doesn't impact table constraints then the concept of perfect equality is already lost.
So I don't like the proposal to add an attnotnullid column to
pg_attribute. What we'd talked about earlier was converting attnotnull
into, effectively, a hint flag saying that there's at least one NOT NULL
constraint attached to the column.
Have we considered making it a table constraint and giving it a name? We already handle that case without difficulty.
Not looking for a detailed explanation.
David J.
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
"David G. Johnston"
Date:
On Monday, February 8, 2016, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
12. At the same time in (subcl. 4.13) mentioned there can be "at least
one NNC" (may be via inheritance?).
This is a bit hard to reason about given that our implementation of inheritance is non-standard.
Are we close to the standard semantics with regard to this particular dynamic?
David J.
Re: Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011
From
Vitaly Burovoy
Date:
On 5/3/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >> On 4/27/16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> Point 2 is where things differ from what I remember; my (possibly >>> flawed) understanding was that there's no difference between those >>> things. Many (maybe all) of the things from this point on are probably >>> fallout from that one change. > >> It is just mentioning that CHECK constraints have influence on >> nullability characteristic, but it differs from NNC. >> NNC creates CHECK constraint, but not vice versa. You can create >> several CHECK "col IS NOT NULL" constraints, but only one NNC (several >> ones by inheritance only?). And DROP NOT NULL should drop only those >> CHECK that is linked with NNC (and inherited), but no more (full >> explanation is in my initial letter). > > This seems to me to be a most curious reading of the standard. > SQL:2011 11.4 <column definition> syntax rule 17a says > > If a <column constraint definition> is specified that contains > the <column constraint> NOT NULL, then it is equivalent to the > following <table constraint definition>: > > CND CHECK ( C IS NOT NULL ) CA > > As a rule, when the SQL spec says "equivalent", they do not mean "it's > sort of like this", they mean the effects are indistinguishable. In > particular, I see nothing whatsoever saying that you're not allowed to > write more than one per column. 1. SQL:2011 4.13 <Columns, fields, and attributes>: — If C is a column of a base table, then an indication of whether it is defined as NOT NULL and, if so, the constraint nameof the associated table constraint definition. NOTE 41 — This indication and the associated constraint name existfor definitional purposes only and are not exposed through the COLUMNS view in the Information Schema. There is only "constraint name", not "constraint names". 2. SQL:2011 11.15 <set column not null clause> General Rule 1: ... If the column descriptor of C does not contain an indication that C is defined as NOT NULL, then: And there is no rule 2. I.e. if the column is already set as NOT NULL you can't specify it as NOT NULL again. 3. SQL:2011 11.15 <set column not null clause> General Rule 1.d: The following <alter table statement> is executed without further Access Rule checking: ALTER TABLE TN ADD CONSTRAINT IDCN CHECK ( CN IS NOT NULL ) > So I don't like the proposal to add an attnotnullid column to > pg_attribute. Why and where to place it? > What we'd talked about earlier was converting attnotnull > into, effectively, a hint flag saying that there's at least one NOT NULL > constraint attached to the column. That still seems like a good approach > to me. Ok. But not only NOT NULL constraint, but also non-deferrable PK, CHECK, domains, may be the strictest FK. > When we're actually ready to throw an error for a null value, > we could root through the table's constraint list for a not-null > constraint name to report. attnotnullid is not for reporting, it is for DROP NOT NULL and recreating "CREATE TABLE" statements via pg_dump. > It doesn't matter which one we select, because > constraint application order has never been promised to be deterministic; > and a few extra cycles at that point don't seem like a big problem to me. > > regards, tom lane -- Best regards, Vitaly Burovoy
"David G. Johnston" <david.g.johnston@gmail.com> writes: > This is a bit hard to reason about given that our implementation of > inheritance is non-standard. Yeah, that's a fairly key point. We've solved those problems with respect to inherited CHECK constraints, and it seems like what we ought to do with NOT NULL is make it work the same as CHECK, rather than invent some new concepts. regards, tom lane