Thread: Domains as Subtypes

Domains as Subtypes

From

elein

Date:

24 March 2006, 15:01:22

Background:

Domains lay the groundwork for inherited basetypes
or subtypes.  By defining a domain and overriding
operators and possibly creating an operator class, then a domain
can be created which inherits the storage method
and all of the functions of a basetype.  The domain
constraint enables a psuedo input function for
the subtype.

Domains enable people to create basetype subtypes using SQL
and procedural languages only.  Current belief is that
this "doesn't work."  However, all of this has worked
since domains were implemented with three exceptions.

1. PRIMARY KEY uses a type's base type opclass indiscriminantly.
   Workaround: for this is to create an opclass for
    the subtype and create a UNIQUE index on that column.

2. Operators which take two different types are not found.
    The key example for this is a LIKE or ~~ type for a
    subtype of text which has as its operands (basetype, parenttype).
    Workaround: use the function instead of the operator.

3. ORDER BY uses the parent type's sort operators.
    Workaround: Use ORDER BY ... USING 'operator'

Proposal:

This proposal only addresses issue #2.  I will look into the
other two as time permits.  But there are sensible workarounds
to #1 and #3 in the meanwhile.

This patch passes regression as well as my domain tests.

Attached is a patch to parse_oper.c which essentially does the
following.  The major change is in binary_oper_exact().
Instead of checking only one level of the basetype it checks
all possible combinations of type and parent types for
an exact match (only).  This favors first the passed in type
and then the basetype for exact matches.  The second part of
this change is to lightly loosen the assumption that any operator
has same type operands.  If an exact match is not made, the UNKNOWNOID
setting for the original operator is re-instated so that
the function selection mechanism can give it a try.

Tests:

Also attached are two files to create a domain/subtype and
to test it.

Issues/Questions:

1) Code review please.
2) Are there any test cases that were not covered by regression
   and my tests?
3) Should I add my test case to the regression tests?
4) Eventually this feature should be added to the docs.  Should
   I wait until all of the pieces are in place or go ahead and
   document the feature as it stands?

elein
--------------------------------------------------------------
elein@varlena.com        Varlena, LLC        www.varlena.com
          PostgreSQL Consulting, Support & Training
PostgreSQL General Bits   http://www.varlena.com/GeneralBits/
--------------------------------------------------------------
I have always depended on the [QA] of strangers.

Attachment

Re: Domains as Subtypes

From

Peter Eisentraut

Date:

24 March 2006, 15:33:56

elein wrote:
> Domains lay the groundwork for inherited basetypes
> or subtypes.

Semantically, a domain and a subtype are completely different things.  A 
domain restricts the possible values of a type but behaves exactly like 
that type in all other respects.  (The fact that PostgreSQL allows you 
to define functions that take domains as arguments undermines that 
concept but if you apply it carefully it can still work.)  Subtypes or 
inherited types on the other hand are defined exactly for the purpose 
of overriding some of their methods (while keeping others unchanged, 
hence the inheritance).  Mixing the two concepts might be convenient 
from an implementation point of view but makes no sense in the data 
model.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Domains as Subtypes

From

Tom Lane

Date:

24 March 2006, 16:47:15

elein <elein@varlena.com> writes:
> Attached is a patch to parse_oper.c which essentially does the
> following.  The major change is in binary_oper_exact().
> Instead of checking only one level of the basetype it checks
> all possible combinations of type and parent types for
> an exact match (only).

I'm going to object to this just on the grounds of the extent to which
it will slow down parsing.  I also think it completely destroys the
logical structure of the lookup code: binary_operator_exact is supposed
to find exact matches, nothing else.  Approximate matches should be
sought only after that's failed.  Also, why aren't the unary-operator
cases handled?  And why are you making the semantics of operator lookup
different from function lookup?

The correct place to be fooling with this is in func_select_candidate(),
whose initial smashing of domains to base types is the proximate cause
of the problems you are complaining of.  I think what you'd need is to
get rid of that blunt instrument and instead put in some kind of logic
to prefer matches to "higher up" domains over matches to the base type,
while not entirely excluding the latter.  func_select_candidate()
already has a lot of heuristics about preferring some matches over
others, and should be able to deal with one more.
        regards, tom lane

Re: Domains as Subtypes

From

elein

Date:

24 March 2006, 17:43:57

On Fri, Mar 24, 2006 at 03:47:13PM -0500, Tom Lane wrote:
> elein <elein@varlena.com> writes:
> > Attached is a patch to parse_oper.c which essentially does the
> > following.  The major change is in binary_oper_exact().
> > Instead of checking only one level of the basetype it checks
> > all possible combinations of type and parent types for
> > an exact match (only).
> 
> I'm going to object to this just on the grounds of the extent to which
> it will slow down parsing.  I also think it completely destroys the
> logical structure of the lookup code: binary_operator_exact is supposed
> to find exact matches, nothing else.  Approximate matches should be
> sought only after that's failed.  Also, why aren't the unary-operator
> cases handled?  And why are you making the semantics of operator lookup
> different from function lookup?

I chose to mess with binary_exact_oper() primarily because this is where
you put in domain checking.  I was following the logical structure that was there.

Also, prior to my patch, it reset the argument types causing the func sel check 
to choose the wrong function.  Without at least part of my change, arguments
to an operator which have two different types as operands do not get 
kicked out to func sel properly (with the type id set to unknown id).

Operators have the single distinction from functions in that when one argument
has an unknown type, then an exact match is tried with the unknown arg
type set to the known type.  This code has always been in there.  I believe
it was introduced to catch most cases before having to go through func sel check.
My code does expand it to do parent type checking, though.

> 
> The correct place to be fooling with this is in func_select_candidate(),
> whose initial smashing of domains to base types is the proximate cause
> of the problems you are complaining of.  I think what you'd need is to
> get rid of that blunt instrument and instead put in some kind of logic
> to prefer matches to "higher up" domains over matches to the base type,
> while not entirely excluding the latter.  func_select_candidate()
> already has a lot of heuristics about preferring some matches over
> others, and should be able to deal with one more.

I think it was binary_oper_exact which smashed the types into basetypes.  
Perhaps both did.  I can look into moving the type hierarchy arg checking over 
to func_select if you are sure this is the way to go.  

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 
elein
elein@varlena.com

Re: Domains as Subtypes

From

elein

Date:

24 March 2006, 17:59:07

On Fri, Mar 24, 2006 at 08:33:51PM +0100, Peter Eisentraut wrote:
> elein wrote:
> > Domains lay the groundwork for inherited basetypes
> > or subtypes.
> 
> Semantically, a domain and a subtype are completely different things.  A 
> domain restricts the possible values of a type but behaves exactly like 
> that type in all other respects.  (The fact that PostgreSQL allows you 
> to define functions that take domains as arguments undermines that 
> concept but if you apply it carefully it can still work.)  Subtypes or 
> inherited types on the other hand are defined exactly for the purpose 
> of overriding some of their methods (while keeping others unchanged, 
> hence the inheritance).  Mixing the two concepts might be convenient 
> from an implementation point of view but makes no sense in the data 
> model.
> 
> -- 
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
> 

I have no argument with your point in theory.  You make valid arguments.  

But in practice, in postgres, we have a strong and orderly type system.  
This has been the case since it was designed to have all types be first 
class types.  Illustra used this to support create type under--mostly for free.  
And it was implemented in Informix 9.

It is a fortunate side-effect of this design that domains actually implement 
create type under type.  When/If we choose to implement subtypes, 
everything but the SQL and writing to the catalogs should already work.  
In the meanwhile domains work, too.

What I see as the difference between postgresql's implementation of
domains and proper subtypes is the constraint.  But I like the constraint.  
It makes the subtyping easier to use. 

So in theory, you are right.  In practice, you can override a domain's
functions and operators and assign it an opclass, making it also
a subtype. (Modulo the three issues I've raised.)

--elein
elein@varlena.com

Re: Domains as Subtypes

From

Tom Lane

Date:

24 March 2006, 19:27:13

elein <elein@varlena.com> writes:
> Operators have the single distinction from functions in that when one argument
> has an unknown type, then an exact match is tried with the unknown arg
> type set to the known type.  This code has always been in there.

Yeah, but it's just a fast special case of the generic UNKNOWN handling
in func_select_candidate.  The domain special case in binary_oper_exact
is a wart --- ideally it should be excised not enlarged ;-).  It's
mainly there because we didn't want to complicate func_select_candidate
to deal with domains.  If you're going to do the latter anyway, it may
be possible to remove the domain special case in binary_oper_exact.

In any case, the patch is very poorly thought through: it will not
behave reasonably if there are multiple levels of domains with different
candidate operators attached to each domain.  (I can tell you have not
tested this: getBaseType drills all the way down, so mk_oper_arg_list
isn't doing what you think.)  You need some kind of policy as to which
candidate is more preferable if some are "closer" types on one input and
some are "closer" on another, and binary_oper_exact is not the place for
that kind of decision.  It's only supposed to be a fast short-circuit
for the cases of exact matches and exact-after-substituting-other-type-
for-UNKNOWN matches.  If you make it do more you'll be upsetting some
careful compromises in the type resolution rules.
        regards, tom lane

Re: Domains as Subtypes

From

elein

Date:

24 March 2006, 20:14:18

On Fri, Mar 24, 2006 at 06:27:13PM -0500, Tom Lane wrote:
> elein <elein@varlena.com> writes:
> > Operators have the single distinction from functions in that when one argument
> > has an unknown type, then an exact match is tried with the unknown arg
> > type set to the known type.  This code has always been in there.
> 
> Yeah, but it's just a fast special case of the generic UNKNOWN handling
> in func_select_candidate.  The domain special case in binary_oper_exact
> is a wart --- ideally it should be excised not enlarged ;-).  It's
> mainly there because we didn't want to complicate func_select_candidate
> to deal with domains.  If you're going to do the latter anyway, it may
> be possible to remove the domain special case in binary_oper_exact.

OK.  I'll see what I can do.

> 
> In any case, the patch is very poorly thought through: it will not
> behave reasonably if there are multiple levels of domains with different
> candidate operators attached to each domain.  (I can tell you have not
> tested this: getBaseType drills all the way down, so mk_oper_arg_list
> isn't doing what you think.)  

I did test this. But maybe not deep enough :(

> You need some kind of policy as to which
> candidate is more preferable if some are "closer" types on one input and
> some are "closer" on another, and binary_oper_exact is not the place for
> that kind of decision.  

This policy was my intent, however, it need not be in binary_oper_exact.

> It's only supposed to be a fast short-circuit
> for the cases of exact matches and exact-after-substituting-other-type-
> for-UNKNOWN matches.  

OK.
> If you make it do more you'll be upsetting some
> careful compromises in the type resolution rules.

If you know of any cases that are not tested in regression let me
know and I'll include them in my test cases.  So far, I have 
broken nothing that I know about. (Well, maybe during development
I, like, broke everything, but that was fixed :)

OK.  Back to the code in the func sel context.

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

elein
elein@varlena.com

Re: Domains as Subtypes

From

Josh Berkus

Date:

24 March 2006, 20:47:29

Elein,

> Domains enable people to create basetype subtypes using SQL
> and procedural languages only.  Current belief is that
> this "doesn't work."  However, all of this has worked
> since domains were implemented with three exceptions.

<grin> you missed one.  Domains as parameters to functions are not
enforced.

I'd love to see the remaining Domain "holes" closed.  Hopefully you'll have
time to follow up on Tom's suggestions.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Domains as Subtypes

From

Tom Lane

Date:

24 March 2006, 23:49:00

Josh Berkus <josh@agliodbs.com> writes:
> <grin> you missed one.  Domains as parameters to functions are not 
> enforced.

I think we've got that one actually.  It's domains as PL-function output
types that aren't checked.  Also plpgsql fails to enforce domain checks
on its local variables.
        regards, tom lane

Re: Domains as Subtypes

From

"Jim C. Nasby"

Date:

25 March 2006, 09:02:22

On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > <grin> you missed one.  Domains as parameters to functions are not 
> > enforced.
> 
> I think we've got that one actually.  It's domains as PL-function output
> types that aren't checked.  Also plpgsql fails to enforce domain checks
> on its local variables.

So is this the complete list? Can we document it somewhere?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Domains as Subtypes

From

Tom Lane

Date:

25 March 2006, 11:14:30

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote:
>> I think we've got that one actually.  It's domains as PL-function output
>> types that aren't checked.  Also plpgsql fails to enforce domain checks
>> on its local variables.

> So is this the complete list?

No, I don't think so.  IIRC we're also missing domain checks on
parameter values in Bind messages, and there might be some other
holes too.  See the archives.

I made a suggestion about closing all these holes at once by
integrating domain checking into the I/O functions for domains,
but it's not clear how to do that without a big performance hit.
        regards, tom lane

Re: Domains as Subtypes

From

Jim Nasby

Date:

25 March 2006, 14:16:49

On Mar 25, 2006, at 4:14 PM, Tom Lane wrote:

> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>> On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote:
>>> I think we've got that one actually.  It's domains as PL-function  
>>> output
>>> types that aren't checked.  Also plpgsql fails to enforce domain  
>>> checks
>>> on its local variables.
>
>> So is this the complete list?
>
> No, I don't think so.  IIRC we're also missing domain checks on
> parameter values in Bind messages, and there might be some other
> holes too.  See the archives.
>
> I made a suggestion about closing all these holes at once by
> integrating domain checking into the I/O functions for domains,
> but it's not clear how to do that without a big performance hit.

Performance hit on just domain handling or overall? Personally, I'd  
rather see a hit on domain handling that we can work on later rather  
than the current state of things which seems to smack of MySQL (Get  
the feature 'checked off the list' first, then worry about doing it  
the right way).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Domains as Subtypes

From

elein

Date:

25 March 2006, 21:13:25

On Sat, Mar 25, 2006 at 07:16:13PM +0100, Jim Nasby wrote:
> On Mar 25, 2006, at 4:14 PM, Tom Lane wrote:
> 
> >"Jim C. Nasby" <jnasby@pervasive.com> writes:
> >>On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote:
> >>>I think we've got that one actually.  It's domains as PL-function  
> >>>output
> >>>types that aren't checked.  Also plpgsql fails to enforce domain  
> >>>checks
> >>>on its local variables.
> >
> >>So is this the complete list?
> >
> >No, I don't think so.  IIRC we're also missing domain checks on
> >parameter values in Bind messages, and there might be some other
> >holes too.  See the archives.
> >
> >I made a suggestion about closing all these holes at once by
> >integrating domain checking into the I/O functions for domains,
> >but it's not clear how to do that without a big performance hit.
> 
> Performance hit on just domain handling or overall? Personally, I'd  
> rather see a hit on domain handling that we can work on later rather  
> than the current state of things which seems to smack of MySQL (Get  
> the feature 'checked off the list' first, then worry about doing it  
> the right way).

The three issues I've raised regard the type behavior of domains with
operators and are completely independent of the input/output checks issues.

But I like the idea of centralizing the check in the input/output
functions.  It seems clearer and cleaner.  The procedural language
checks are harder, but may be easier to implement if there were
a centralized check domain functionality.

--elein


> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
>

Re: Domains as Subtypes

From

Tom Lane

Date:

27 March 2006, 12:41:36

elein <elein@varlena.com> writes:
> But I like the idea of centralizing the check in the input/output
> functions.  It seems clearer and cleaner.

I remembered the problem with doing it that way: an input function can't
enforce a domain NOTNULL constraint, because it won't even get invoked
for a null input value.  So there seems no way around having a special
case for domains in all places where I/O conversion is done.
        regards, tom lane

Re: Domains as Subtypes

From

elein

Date:

27 March 2006, 18:12:37

On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote:
> elein <elein@varlena.com> writes:
> > But I like the idea of centralizing the check in the input/output
> > functions.  It seems clearer and cleaner.
> 
> I remembered the problem with doing it that way: an input function can't
> enforce a domain NOTNULL constraint, because it won't even get invoked
> for a null input value.  So there seems no way around having a special
> case for domains in all places where I/O conversion is done.
> 

The notnull attribute of the pg_type table should be set to not null
in the case of a not null constraint on a domain (type).
You should not have to invoke the input function to check for that.  
Or perhaps I'm missing the details.

>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 

elein
elein@varlena.com

Re: Domains as Subtypes

From

Tom Lane

Date:

27 March 2006, 18:45:57

elein <elein@varlena.com> writes:
> On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote:
>> I remembered the problem with doing it that way: an input function can't
>> enforce a domain NOTNULL constraint, because it won't even get invoked
>> for a null input value.  So there seems no way around having a special
>> case for domains in all places where I/O conversion is done.

> The notnull attribute of the pg_type table should be set to not null
> in the case of a not null constraint on a domain (type).
> You should not have to invoke the input function to check for that.  
> Or perhaps I'm missing the details.

Well, I can see two problems:

1. If we have to add code to everyplace that calls an input function to
do that, then we've failed to achieve the hoped-for goal of solving the
problem in just one place.

2. NOTNULL is just the most obvious form of the problem.  There could be
domain CHECK constraints that fail on null input --- CHECK(VALUE IS NOT
NULL) for example, or something more subtle.  If we don't run the input
function then this means the CHECK constraints also have to be done
out-of-band, and then we've lost any leverage whatsoever.

We could push the problem into a domain input function if we abandoned
the current rule that input functions are never invoked for nulls (we
could check their strictness flag to decide whether to do it).  This
sort of change seems distinctly cleaner than pushing explicit knowledge
about domains into all the places that use input functions, but it's
still pretty ugly:

A. We still have to touch everyplace that uses an input function; any
code not changed will simply do the Wrong Thing for nulls, which is not
a very friendly failure mode.  (And we know there are places outside the
core that use this stuff, for instance non-core PLs.)

B. C-language input functions for most datatypes will need to be
declared strict, else they'll crash on null input, which is an even
less friendly behavior.  Again, we can't be sure that non-core datatypes
get this right at present.
        regards, tom lane

Re: Domains as Subtypes

From

Bruce Momjian

Date:

10 April 2006, 18:02:34

TODO has:
* Allow user-defined functions retuning a domain value to enforce domain  constraints

Is there something we should add to this?

---------------------------------------------------------------------------

Tom Lane wrote:
> elein <elein@varlena.com> writes:
> > On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote:
> >> I remembered the problem with doing it that way: an input function can't
> >> enforce a domain NOTNULL constraint, because it won't even get invoked
> >> for a null input value.  So there seems no way around having a special
> >> case for domains in all places where I/O conversion is done.
> 
> > The notnull attribute of the pg_type table should be set to not null
> > in the case of a not null constraint on a domain (type).
> > You should not have to invoke the input function to check for that.  
> > Or perhaps I'm missing the details.
> 
> Well, I can see two problems:
> 
> 1. If we have to add code to everyplace that calls an input function to
> do that, then we've failed to achieve the hoped-for goal of solving the
> problem in just one place.
> 
> 2. NOTNULL is just the most obvious form of the problem.  There could be
> domain CHECK constraints that fail on null input --- CHECK(VALUE IS NOT
> NULL) for example, or something more subtle.  If we don't run the input
> function then this means the CHECK constraints also have to be done
> out-of-band, and then we've lost any leverage whatsoever.
> 
> 
> We could push the problem into a domain input function if we abandoned
> the current rule that input functions are never invoked for nulls (we
> could check their strictness flag to decide whether to do it).  This
> sort of change seems distinctly cleaner than pushing explicit knowledge
> about domains into all the places that use input functions, but it's
> still pretty ugly:
> 
> A. We still have to touch everyplace that uses an input function; any
> code not changed will simply do the Wrong Thing for nulls, which is not
> a very friendly failure mode.  (And we know there are places outside the
> core that use this stuff, for instance non-core PLs.)
> 
> B. C-language input functions for most datatypes will need to be
> declared strict, else they'll crash on null input, which is an even
> less friendly behavior.  Again, we can't be sure that non-core datatypes
> get this right at present.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Domains as Subtypes

From

Tom Lane

Date:

10 April 2006, 18:42:33

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> TODO has:
>     * Allow user-defined functions retuning a domain value to enforce domain
>       constraints

> Is there something we should add to this?

Yeah, a DONE marker ;-)
        regards, tom lane

Re: Domains as Subtypes

From

Bruce Momjian

Date:

10 April 2006, 23:36:55

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > TODO has:
> >     * Allow user-defined functions retuning a domain value to enforce domain
> >       constraints
> 
> > Is there something we should add to this?
> 
> Yeah, a DONE marker ;-)

OK, marked as done.  I assume that's what you mean, or are you saying it
just _needs_ to be completed?

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +