Thread: challenging constraint situation - how do I make it real in SQL?
I have a business rule which gives me rahter big challenges to "melt in SQL": At one "point in time" only one version may be active. Simplified: CREATE TABLE sample ( id_field int4, value text, validfrom timestamp, validto timestamp ) within one timespan, every id_field has to be unique. so, this is a set of legal data: 1, 'fire', -infinity , 2005-09-01 21:02:15.078 1, 'water', 2005-09-01 21:02:15.078, infinity this is an illegal set of data: 1, 'fire', -infinity , 2005-09-01 21:02:15.078 1, 'water', 2005-05-01 12:15:15.078, infinity because between 2005-05-01 12:15:15.078 and 2005-09-01 21:02:15.078 there are 2 values for id_field 1 As a first measure I have made a unique key on (id_field , validfrom), because the "timespans" are created by update rules of a view, so that I am quite sure that there are no overlapping ... as long as nobody touches the table by himself. But it is not really fitting; and manual editing of the table can disturb it. So, ist there some constraint creatable to make sure that "in each point of time" each id only exists once? Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Reinsburgstraße 202b 70197 Stuttgart 0173/9409607 - PostgreSQL - supported by a community that does not put you on hold
Harald Armin Massa wrote: > I have a business rule which gives me rahter big challenges to "melt in > SQL": > But it is not really fitting; and manual editing of the table can > disturb it. So, ist there some constraint creatable to make sure that > "in each point of time" each id only exists once? You probably need to write a constraint trigger, using OVERLAPS and EXISTS operators. Writing one is rather trivial, but depends on your exact business rules. From your example it is not clear to me if you can have, for example, two records with different id's within the same or overlapping timespans. When encountering this problem I usually wonder why there isn't a data type that can store a timestamp and can be used to create a UNIQUE INDEX over it's values. That'd be wonderful. Well, maybe one day I'll actually have time to create one... -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys wrote: > > When encountering this problem I usually wonder why there isn't a data > type that can store a timestamp and can be used to create a UNIQUE > INDEX over it's values. That'd be wonderful. > Well, maybe one day I'll actually have time to create one... > I tried this at trigger level. The real bear is in the fact that there are two columns, not one. It is trivial to write an exclusion constraint that disallows overlapping (including nested) values. What was hard was determining the meta-data structure, how do you have two columns that are sometimes treated as one and sometimes as two? Possible, but devilish in the details.
Attachment
Kenneth Downs wrote: > Alban Hertroys wrote: > >> When encountering this problem I usually wonder why there isn't a data >> type that can store a timestamp and can be used to create a UNIQUE >> INDEX over it's values. That'd be wonderful. >> Well, maybe one day I'll actually have time to create one... >> > I tried this at trigger level. The real bear is in the fact that there > are two columns, not one. It is trivial to write an exclusion > constraint that disallows overlapping (including nested) values. What > was hard was determining the meta-data structure, how do you have two > columns that are sometimes treated as one and sometimes as two? Are you refering to a 'timespan' data type that can be determined to be unique? I can see some problems there, as both value and range matter; it'd be similar to determining the uniqueness of an area in a rectangle (though 1 dimensional only, of course). I've never really dug into this, so I don't know what possibilities PostgreSQL offers in this field. Basing this on faith :) Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys wrote: > Kenneth Downs wrote: > >> Alban Hertroys wrote: >> >>> When encountering this problem I usually wonder why there isn't a >>> data type that can store a timestamp and can be used to create a >>> UNIQUE INDEX over it's values. That'd be wonderful. >>> Well, maybe one day I'll actually have time to create one... >>> >> I tried this at trigger level. The real bear is in the fact that >> there are two columns, not one. It is trivial to write an exclusion >> constraint that disallows overlapping (including nested) values. >> What was hard was determining the meta-data structure, how do you >> have two columns that are sometimes treated as one and sometimes as two? > > > Are you refering to a 'timespan' data type that can be determined to > be unique? > > I can see some problems there, as both value and range matter; it'd be > similar to determining the uniqueness of an area in a rectangle > (though 1 dimensional only, of course). > > I've never really dug into this, so I don't know what possibilities > PostgreSQL offers in this field. Basing this on faith :) > Yes. I use a heavily dictionary-based toolset. I write out database specs in CSS-like syntax and it diff's and builds the databases and writes all triggers, indexes and so forth. The approach I tried was to have a "range" or "interval" type. You place a column into a table named "resv_date" or whatever and it would expand the definition into two columns, you'd get resv_date_beg and resv_date_end. If you declared the "resv_date" column a primary key column, it would build trigger code to detect overlaps and nesting and reject those. As I said, defining behavior and implementing it was not hard. I even had foreign keys into ranges that were "smart". If the foreign key was a single column instead of two, it would satisfy RI if the single value was between the interval values in the parent table. The problem comes from the split-personality of the "resv_date" column. Sometimes its one column, sometimes its two. This made writing the tools nasty and difficult, and I scratched it and (gasp!) did some validation in client code. I have it in mind to restore the feature, but in a different way. The two columns should be defined separately, not as one, and then the second of the two gets a flag setting, like: column range_beg { primary_key: Y; } column range_end { primary_key: Y; range_from: range_beg; } The "range_from" setting ties one column to the other and should give me all the behavior I had without all of the confusion. It would have three effects: 1) Force range_end >= range_beg 2) Convert the primary key into overlap/nest exclusion 3) Allow a single column foreign key in another table to "know" that it should do a within match instead of an equality match
Attachment
> I have a business rule which gives me rahter big challenges to "melt > in SQL": > > At one "point in time" only one version may be active. I have no specific solution, but I highly recommend Richard Snodgrass's book, Developing Time-Oriented Database Applications in SQL, which discusses this kind of situation, and many other similar ones. It's out of print, but available from his homepage as a PDF: http://www.cs.arizona.edu/~rts/ - John D. Burger MITRE
Kenneth Downs wrote: > Alban Hertroys wrote: > >> Kenneth Downs wrote: >> >>> Alban Hertroys wrote: >>> >>>> When encountering this problem I usually wonder why there isn't a >>>> data type that can store a timestamp and can be used to create a >>>> UNIQUE INDEX over it's values. That'd be wonderful. >>>> Well, maybe one day I'll actually have time to create one... >>>> >>> I tried this at trigger level. The real bear is in the fact that >>> there are two columns, not one. It is trivial to write an exclusion >>> constraint that disallows overlapping (including nested) values. >>> What was hard was determining the meta-data structure, how do you >>> have two columns that are sometimes treated as one and sometimes as two? >> >> >> Are you refering to a 'timespan' data type that can be determined to >> be unique? >> >> I can see some problems there, as both value and range matter; it'd be >> similar to determining the uniqueness of an area in a rectangle >> (though 1 dimensional only, of course). >> >> I've never really dug into this, so I don't know what possibilities >> PostgreSQL offers in this field. Basing this on faith :) >> > Yes. > I use a heavily dictionary-based toolset. I write out database specs in > CSS-like syntax and it diff's and builds the databases and writes all > triggers, indexes and so forth. > > The approach I tried was to have a "range" or "interval" type. You > place a column into a table named "resv_date" or whatever and it would > expand the definition into two columns, you'd get resv_date_beg and > resv_date_end. If you declared the "resv_date" column a primary key > column, it would build trigger code to detect overlaps and nesting and > reject those. > > As I said, defining behavior and implementing it was not hard. I even > had foreign keys into ranges that were "smart". If the foreign key was > a single column instead of two, it would satisfy RI if the single value > was between the interval values in the parent table. > > The problem comes from the split-personality of the "resv_date" column. > Sometimes its one column, sometimes its two. This made writing the > tools nasty and difficult, and I scratched it and (gasp!) did some > validation in client code. You could define a new datatype containing two timestamps. From 8.1 onwards you can use such a composite type as a field, I believe (but I haven't checked it). You can use pgsql functions to define operators for your type - but you'll have to use the default input/output functions for composite types I guess, because those can only be codec in C AFAIK. greetings, Florian Pflug
Florian G. Pflug wrote: > You could define a new datatype containing two timestamps. From 8.1 > onwards you can use such a composite type as a field, I believe (but I > haven't checked it). > > You can use pgsql functions to define operators for your type - but > you'll have to use the default input/output functions for composite > types I guess, because those can only be codec in C AFAIK. That's what I was refering to. Apparently I didn't make myself very clear. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Kenneth Downs wrote: > Alban Hertroys wrote: > The approach I tried was to have a "range" or "interval" type. You > place a column into a table named "resv_date" or whatever and it would > expand the definition into two columns, you'd get resv_date_beg and > resv_date_end. If you declared the "resv_date" column a primary key > column, it would build trigger code to detect overlaps and nesting and > reject those. > > As I said, defining behavior and implementing it was not hard. I even > had foreign keys into ranges that were "smart". If the foreign key was > a single column instead of two, it would satisfy RI if the single value > was between the interval values in the parent table. Been there, done that ;) > The problem comes from the split-personality of the "resv_date" column. > Sometimes its one column, sometimes its two. This made writing the > tools nasty and difficult, and I scratched it and (gasp!) did some > validation in client code. What's the benefit of allowing it to be only one column? > I have it in mind to restore the feature, but in a different way. The > two columns should be defined separately, not as one, and then the > second of the two gets a flag setting, like: > > column range_beg { primary_key: Y; } > column range_end { primary_key: Y; range_from: range_beg; } > > The "range_from" setting ties one column to the other and should give me > all the behavior I had without all of the confusion. It would have > three effects: > > 1) Force range_end >= range_beg > 2) Convert the primary key into overlap/nest exclusion > 3) Allow a single column foreign key in another table to "know" that it > should do a within match instead of an equality match And making that one column of a composite type would be just the thing, I thought somewhere at the start of this thread (Thanks for mentioning "composite types", Florian, couldn't remember what they're called). Cheers, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > Are you refering to a 'timespan' data type that can be determined to be > unique? > I can see some problems there, as both value and range matter; it'd be > similar to determining the uniqueness of an area in a rectangle (though > 1 dimensional only, of course). Except that what you want is to forbid overlap, not forbid equality. This is not possible with btree-based unique indexes, because btree will not like an opclass whose "equality" member is not transitive. If you look in the archives you'll find previous discussion of this point, including some blue-sky speculation about whether GiST indexes could be made to serve the purpose. That's a long way from being a solution you could use today, though. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >>I can see some problems there, as both value and range matter; it'd be >>similar to determining the uniqueness of an area in a rectangle (though >>1 dimensional only, of course). > > Except that what you want is to forbid overlap, not forbid equality. > This is not possible with btree-based unique indexes, because btree > will not like an opclass whose "equality" member is not transitive. With what I have in mind, both overlap and equality would violate the unique constraint. I don't quite see why someone'd want to forbid overlap but to allow equality; isn't not allowing equality the whole point of a unique constraint? Yesterday being a holiday, I had some time to look into this. I'm now in the design phase for a (more general) vector type. A timespan type is just a special case for timestamps, the vector type could expand that to integers, reals, points, dates, etc. Anything scalar, basically, with some distinctions in operators depending on the type being discrete or not (discrete types could have adjacency operators, for example). I haven't looked into what types of index would be suitable, I suppose that could cause difficulties. OTOH, if neither overlap nor equality are allowed on a unique index, I'm under the impression that a btree should just work. Knowing that I rarely have time to spend on things like this, is here an appropriate place to discuss design issues or recruiting people for help? > If you look in the archives you'll find previous discussion of this > point, including some blue-sky speculation about whether GiST indexes > could be made to serve the purpose. That's a long way from being a > solution you could use today, though. Looks like a good starting place, I should have a look. Any pointers as to what to look for? Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > Tom Lane wrote: >> Except that what you want is to forbid overlap, not forbid equality. >> This is not possible with btree-based unique indexes, because btree >> will not like an opclass whose "equality" member is not transitive. > With what I have in mind, both overlap and equality would violate the > unique constraint. I don't quite see why someone'd want to forbid > overlap but to allow equality; isn't not allowing equality the whole > point of a unique constraint? You're missing the point. Letting "~" represent the operator that tests for interval-overlap, we can have A -------------- B ------------------ C ---------------- so that A ~ B and B ~ C but not A ~ C. This is too much unlike normal equality for a btree to work with "~" as the "equality" operator. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >>With what I have in mind, both overlap and equality would violate the >>unique constraint. I don't quite see why someone'd want to forbid >>overlap but to allow equality; isn't not allowing equality the whole >>point of a unique constraint? > > You're missing the point. Letting "~" represent the operator that > tests for interval-overlap, we can have > A -------------- > B ------------------ I'd say "unique constraint violation" right here (provided there's a unique constraint on this column, of course). The order in which these are inserted/updated doesn't seem to matter either. I'm afraid I'm still missing the point... or maybe I'm not wrong??? > C ---------------- > so that A ~ B and B ~ C but not A ~ C. This is too much unlike normal > equality for a btree to work with "~" as the "equality" operator. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > I'd say "unique constraint violation" right here (provided there's a > unique constraint on this column, of course). The order in which these > are inserted/updated doesn't seem to matter either. I'm afraid I'm still > missing the point... The point is that btree depends on a number of assumptions about the behavior of the operators in an operator class, and one of those assumptions is that "equality" is transitive. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: > >>I'd say "unique constraint violation" right here (provided there's a >>unique constraint on this column, of course). The order in which these >>are inserted/updated doesn't seem to matter either. I'm afraid I'm still >>missing the point... > > The point is that btree depends on a number of assumptions about the > behavior of the operators in an operator class, and one of those > assumptions is that "equality" is transitive. So, referring back to your example you mean that though B overlaps A, and C overlaps B, C doesn't overlap A and thus they're not transitive. I see. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //