Thread: Domains as Subtypes
Background: Domains lay the groundwork for inherited basetypes or subtypes. By defining a domain and overriding operators and possibly creating an operator class, then a domain can be created which inherits the storage method and all of the functions of a basetype. The domain constraint enables a psuedo input function for the subtype. Domains enable people to create basetype subtypes using SQL and procedural languages only. Current belief is that this "doesn't work." However, all of this has worked since domains were implemented with three exceptions. 1. PRIMARY KEY uses a type's base type opclass indiscriminantly. Workaround: for this is to create an opclass for the subtype and create a UNIQUE index on that column. 2. Operators which take two different types are not found. The key example for this is a LIKE or ~~ type for a subtype of text which has as its operands (basetype, parenttype). Workaround: use the function instead of the operator. 3. ORDER BY uses the parent type's sort operators. Workaround: Use ORDER BY ... USING 'operator' Proposal: This proposal only addresses issue #2. I will look into the other two as time permits. But there are sensible workarounds to #1 and #3 in the meanwhile. This patch passes regression as well as my domain tests. Attached is a patch to parse_oper.c which essentially does the following. The major change is in binary_oper_exact(). Instead of checking only one level of the basetype it checks all possible combinations of type and parent types for an exact match (only). This favors first the passed in type and then the basetype for exact matches. The second part of this change is to lightly loosen the assumption that any operator has same type operands. If an exact match is not made, the UNKNOWNOID setting for the original operator is re-instated so that the function selection mechanism can give it a try. Tests: Also attached are two files to create a domain/subtype and to test it. Issues/Questions: 1) Code review please. 2) Are there any test cases that were not covered by regression and my tests? 3) Should I add my test case to the regression tests? 4) Eventually this feature should be added to the docs. Should I wait until all of the pieces are in place or go ahead and document the feature as it stands? elein -------------------------------------------------------------- elein@varlena.com Varlena, LLC www.varlena.com PostgreSQL Consulting, Support & Training PostgreSQL General Bits http://www.varlena.com/GeneralBits/ -------------------------------------------------------------- I have always depended on the [QA] of strangers.
Attachment
elein wrote: > Domains lay the groundwork for inherited basetypes > or subtypes. Semantically, a domain and a subtype are completely different things. A domain restricts the possible values of a type but behaves exactly like that type in all other respects. (The fact that PostgreSQL allows you to define functions that take domains as arguments undermines that concept but if you apply it carefully it can still work.) Subtypes or inherited types on the other hand are defined exactly for the purpose of overriding some of their methods (while keeping others unchanged, hence the inheritance). Mixing the two concepts might be convenient from an implementation point of view but makes no sense in the data model. -- Peter Eisentraut http://developer.postgresql.org/~petere/
elein <elein@varlena.com> writes: > Attached is a patch to parse_oper.c which essentially does the > following. The major change is in binary_oper_exact(). > Instead of checking only one level of the basetype it checks > all possible combinations of type and parent types for > an exact match (only). I'm going to object to this just on the grounds of the extent to which it will slow down parsing. I also think it completely destroys the logical structure of the lookup code: binary_operator_exact is supposed to find exact matches, nothing else. Approximate matches should be sought only after that's failed. Also, why aren't the unary-operator cases handled? And why are you making the semantics of operator lookup different from function lookup? The correct place to be fooling with this is in func_select_candidate(), whose initial smashing of domains to base types is the proximate cause of the problems you are complaining of. I think what you'd need is to get rid of that blunt instrument and instead put in some kind of logic to prefer matches to "higher up" domains over matches to the base type, while not entirely excluding the latter. func_select_candidate() already has a lot of heuristics about preferring some matches over others, and should be able to deal with one more. regards, tom lane
On Fri, Mar 24, 2006 at 03:47:13PM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > Attached is a patch to parse_oper.c which essentially does the > > following. The major change is in binary_oper_exact(). > > Instead of checking only one level of the basetype it checks > > all possible combinations of type and parent types for > > an exact match (only). > > I'm going to object to this just on the grounds of the extent to which > it will slow down parsing. I also think it completely destroys the > logical structure of the lookup code: binary_operator_exact is supposed > to find exact matches, nothing else. Approximate matches should be > sought only after that's failed. Also, why aren't the unary-operator > cases handled? And why are you making the semantics of operator lookup > different from function lookup? I chose to mess with binary_exact_oper() primarily because this is where you put in domain checking. I was following the logical structure that was there. Also, prior to my patch, it reset the argument types causing the func sel check to choose the wrong function. Without at least part of my change, arguments to an operator which have two different types as operands do not get kicked out to func sel properly (with the type id set to unknown id). Operators have the single distinction from functions in that when one argument has an unknown type, then an exact match is tried with the unknown arg type set to the known type. This code has always been in there. I believe it was introduced to catch most cases before having to go through func sel check. My code does expand it to do parent type checking, though. > > The correct place to be fooling with this is in func_select_candidate(), > whose initial smashing of domains to base types is the proximate cause > of the problems you are complaining of. I think what you'd need is to > get rid of that blunt instrument and instead put in some kind of logic > to prefer matches to "higher up" domains over matches to the base type, > while not entirely excluding the latter. func_select_candidate() > already has a lot of heuristics about preferring some matches over > others, and should be able to deal with one more. I think it was binary_oper_exact which smashed the types into basetypes. Perhaps both did. I can look into moving the type hierarchy arg checking over to func_select if you are sure this is the way to go. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > elein elein@varlena.com
On Fri, Mar 24, 2006 at 08:33:51PM +0100, Peter Eisentraut wrote: > elein wrote: > > Domains lay the groundwork for inherited basetypes > > or subtypes. > > Semantically, a domain and a subtype are completely different things. A > domain restricts the possible values of a type but behaves exactly like > that type in all other respects. (The fact that PostgreSQL allows you > to define functions that take domains as arguments undermines that > concept but if you apply it carefully it can still work.) Subtypes or > inherited types on the other hand are defined exactly for the purpose > of overriding some of their methods (while keeping others unchanged, > hence the inheritance). Mixing the two concepts might be convenient > from an implementation point of view but makes no sense in the data > model. > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > I have no argument with your point in theory. You make valid arguments. But in practice, in postgres, we have a strong and orderly type system. This has been the case since it was designed to have all types be first class types. Illustra used this to support create type under--mostly for free. And it was implemented in Informix 9. It is a fortunate side-effect of this design that domains actually implement create type under type. When/If we choose to implement subtypes, everything but the SQL and writing to the catalogs should already work. In the meanwhile domains work, too. What I see as the difference between postgresql's implementation of domains and proper subtypes is the constraint. But I like the constraint. It makes the subtyping easier to use. So in theory, you are right. In practice, you can override a domain's functions and operators and assign it an opclass, making it also a subtype. (Modulo the three issues I've raised.) --elein elein@varlena.com
elein <elein@varlena.com> writes: > Operators have the single distinction from functions in that when one argument > has an unknown type, then an exact match is tried with the unknown arg > type set to the known type. This code has always been in there. Yeah, but it's just a fast special case of the generic UNKNOWN handling in func_select_candidate. The domain special case in binary_oper_exact is a wart --- ideally it should be excised not enlarged ;-). It's mainly there because we didn't want to complicate func_select_candidate to deal with domains. If you're going to do the latter anyway, it may be possible to remove the domain special case in binary_oper_exact. In any case, the patch is very poorly thought through: it will not behave reasonably if there are multiple levels of domains with different candidate operators attached to each domain. (I can tell you have not tested this: getBaseType drills all the way down, so mk_oper_arg_list isn't doing what you think.) You need some kind of policy as to which candidate is more preferable if some are "closer" types on one input and some are "closer" on another, and binary_oper_exact is not the place for that kind of decision. It's only supposed to be a fast short-circuit for the cases of exact matches and exact-after-substituting-other-type- for-UNKNOWN matches. If you make it do more you'll be upsetting some careful compromises in the type resolution rules. regards, tom lane
On Fri, Mar 24, 2006 at 06:27:13PM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > Operators have the single distinction from functions in that when one argument > > has an unknown type, then an exact match is tried with the unknown arg > > type set to the known type. This code has always been in there. > > Yeah, but it's just a fast special case of the generic UNKNOWN handling > in func_select_candidate. The domain special case in binary_oper_exact > is a wart --- ideally it should be excised not enlarged ;-). It's > mainly there because we didn't want to complicate func_select_candidate > to deal with domains. If you're going to do the latter anyway, it may > be possible to remove the domain special case in binary_oper_exact. OK. I'll see what I can do. > > In any case, the patch is very poorly thought through: it will not > behave reasonably if there are multiple levels of domains with different > candidate operators attached to each domain. (I can tell you have not > tested this: getBaseType drills all the way down, so mk_oper_arg_list > isn't doing what you think.) I did test this. But maybe not deep enough :( > You need some kind of policy as to which > candidate is more preferable if some are "closer" types on one input and > some are "closer" on another, and binary_oper_exact is not the place for > that kind of decision. This policy was my intent, however, it need not be in binary_oper_exact. > It's only supposed to be a fast short-circuit > for the cases of exact matches and exact-after-substituting-other-type- > for-UNKNOWN matches. OK. > If you make it do more you'll be upsetting some > careful compromises in the type resolution rules. If you know of any cases that are not tested in regression let me know and I'll include them in my test cases. So far, I have broken nothing that I know about. (Well, maybe during development I, like, broke everything, but that was fixed :) OK. Back to the code in the func sel context. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > elein elein@varlena.com
Elein, > Domains enable people to create basetype subtypes using SQL > and procedural languages only. Current belief is that > this "doesn't work." However, all of this has worked > since domains were implemented with three exceptions. <grin> you missed one. Domains as parameters to functions are not enforced. I'd love to see the remaining Domain "holes" closed. Hopefully you'll have time to follow up on Tom's suggestions. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > <grin> you missed one. Domains as parameters to functions are not > enforced. I think we've got that one actually. It's domains as PL-function output types that aren't checked. Also plpgsql fails to enforce domain checks on its local variables. regards, tom lane
On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > <grin> you missed one. Domains as parameters to functions are not > > enforced. > > I think we've got that one actually. It's domains as PL-function output > types that aren't checked. Also plpgsql fails to enforce domain checks > on its local variables. So is this the complete list? Can we document it somewhere? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote: >> I think we've got that one actually. It's domains as PL-function output >> types that aren't checked. Also plpgsql fails to enforce domain checks >> on its local variables. > So is this the complete list? No, I don't think so. IIRC we're also missing domain checks on parameter values in Bind messages, and there might be some other holes too. See the archives. I made a suggestion about closing all these holes at once by integrating domain checking into the I/O functions for domains, but it's not clear how to do that without a big performance hit. regards, tom lane
On Mar 25, 2006, at 4:14 PM, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: >> On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote: >>> I think we've got that one actually. It's domains as PL-function >>> output >>> types that aren't checked. Also plpgsql fails to enforce domain >>> checks >>> on its local variables. > >> So is this the complete list? > > No, I don't think so. IIRC we're also missing domain checks on > parameter values in Bind messages, and there might be some other > holes too. See the archives. > > I made a suggestion about closing all these holes at once by > integrating domain checking into the I/O functions for domains, > but it's not clear how to do that without a big performance hit. Performance hit on just domain handling or overall? Personally, I'd rather see a hit on domain handling that we can work on later rather than the current state of things which seems to smack of MySQL (Get the feature 'checked off the list' first, then worry about doing it the right way). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, Mar 25, 2006 at 07:16:13PM +0100, Jim Nasby wrote: > On Mar 25, 2006, at 4:14 PM, Tom Lane wrote: > > >"Jim C. Nasby" <jnasby@pervasive.com> writes: > >>On Fri, Mar 24, 2006 at 10:49:00PM -0500, Tom Lane wrote: > >>>I think we've got that one actually. It's domains as PL-function > >>>output > >>>types that aren't checked. Also plpgsql fails to enforce domain > >>>checks > >>>on its local variables. > > > >>So is this the complete list? > > > >No, I don't think so. IIRC we're also missing domain checks on > >parameter values in Bind messages, and there might be some other > >holes too. See the archives. > > > >I made a suggestion about closing all these holes at once by > >integrating domain checking into the I/O functions for domains, > >but it's not clear how to do that without a big performance hit. > > Performance hit on just domain handling or overall? Personally, I'd > rather see a hit on domain handling that we can work on later rather > than the current state of things which seems to smack of MySQL (Get > the feature 'checked off the list' first, then worry about doing it > the right way). The three issues I've raised regard the type behavior of domains with operators and are completely independent of the input/output checks issues. But I like the idea of centralizing the check in the input/output functions. It seems clearer and cleaner. The procedural language checks are harder, but may be easier to implement if there were a centralized check domain functionality. --elein > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
elein <elein@varlena.com> writes: > But I like the idea of centralizing the check in the input/output > functions. It seems clearer and cleaner. I remembered the problem with doing it that way: an input function can't enforce a domain NOTNULL constraint, because it won't even get invoked for a null input value. So there seems no way around having a special case for domains in all places where I/O conversion is done. regards, tom lane
On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > But I like the idea of centralizing the check in the input/output > > functions. It seems clearer and cleaner. > > I remembered the problem with doing it that way: an input function can't > enforce a domain NOTNULL constraint, because it won't even get invoked > for a null input value. So there seems no way around having a special > case for domains in all places where I/O conversion is done. > The notnull attribute of the pg_type table should be set to not null in the case of a not null constraint on a domain (type). You should not have to invoke the input function to check for that. Or perhaps I'm missing the details. > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > elein elein@varlena.com
elein <elein@varlena.com> writes: > On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote: >> I remembered the problem with doing it that way: an input function can't >> enforce a domain NOTNULL constraint, because it won't even get invoked >> for a null input value. So there seems no way around having a special >> case for domains in all places where I/O conversion is done. > The notnull attribute of the pg_type table should be set to not null > in the case of a not null constraint on a domain (type). > You should not have to invoke the input function to check for that. > Or perhaps I'm missing the details. Well, I can see two problems: 1. If we have to add code to everyplace that calls an input function to do that, then we've failed to achieve the hoped-for goal of solving the problem in just one place. 2. NOTNULL is just the most obvious form of the problem. There could be domain CHECK constraints that fail on null input --- CHECK(VALUE IS NOT NULL) for example, or something more subtle. If we don't run the input function then this means the CHECK constraints also have to be done out-of-band, and then we've lost any leverage whatsoever. We could push the problem into a domain input function if we abandoned the current rule that input functions are never invoked for nulls (we could check their strictness flag to decide whether to do it). This sort of change seems distinctly cleaner than pushing explicit knowledge about domains into all the places that use input functions, but it's still pretty ugly: A. We still have to touch everyplace that uses an input function; any code not changed will simply do the Wrong Thing for nulls, which is not a very friendly failure mode. (And we know there are places outside the core that use this stuff, for instance non-core PLs.) B. C-language input functions for most datatypes will need to be declared strict, else they'll crash on null input, which is an even less friendly behavior. Again, we can't be sure that non-core datatypes get this right at present. regards, tom lane
TODO has: * Allow user-defined functions retuning a domain value to enforce domain constraints Is there something we should add to this? --------------------------------------------------------------------------- Tom Lane wrote: > elein <elein@varlena.com> writes: > > On Mon, Mar 27, 2006 at 11:41:30AM -0500, Tom Lane wrote: > >> I remembered the problem with doing it that way: an input function can't > >> enforce a domain NOTNULL constraint, because it won't even get invoked > >> for a null input value. So there seems no way around having a special > >> case for domains in all places where I/O conversion is done. > > > The notnull attribute of the pg_type table should be set to not null > > in the case of a not null constraint on a domain (type). > > You should not have to invoke the input function to check for that. > > Or perhaps I'm missing the details. > > Well, I can see two problems: > > 1. If we have to add code to everyplace that calls an input function to > do that, then we've failed to achieve the hoped-for goal of solving the > problem in just one place. > > 2. NOTNULL is just the most obvious form of the problem. There could be > domain CHECK constraints that fail on null input --- CHECK(VALUE IS NOT > NULL) for example, or something more subtle. If we don't run the input > function then this means the CHECK constraints also have to be done > out-of-band, and then we've lost any leverage whatsoever. > > > We could push the problem into a domain input function if we abandoned > the current rule that input functions are never invoked for nulls (we > could check their strictness flag to decide whether to do it). This > sort of change seems distinctly cleaner than pushing explicit knowledge > about domains into all the places that use input functions, but it's > still pretty ugly: > > A. We still have to touch everyplace that uses an input function; any > code not changed will simply do the Wrong Thing for nulls, which is not > a very friendly failure mode. (And we know there are places outside the > core that use this stuff, for instance non-core PLs.) > > B. C-language input functions for most datatypes will need to be > declared strict, else they'll crash on null input, which is an even > less friendly behavior. Again, we can't be sure that non-core datatypes > get this right at present. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > TODO has: > * Allow user-defined functions retuning a domain value to enforce domain > constraints > Is there something we should add to this? Yeah, a DONE marker ;-) regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > TODO has: > > * Allow user-defined functions retuning a domain value to enforce domain > > constraints > > > Is there something we should add to this? > > Yeah, a DONE marker ;-) OK, marked as done. I assume that's what you mean, or are you saying it just _needs_ to be completed? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +