Proposed feature: Selective Foreign Keys - Mailing list pgsql-hackers
From | Tom Dunstan |
---|---|
Subject | Proposed feature: Selective Foreign Keys |
Date | |
Msg-id | 85A0355E-E76C-4B2D-8FB6-7353CA537A3E@tomd.cc Whole thread Raw |
Responses |
Re: Proposed feature: Selective Foreign Keys
Re: Proposed feature: Selective Foreign Keys |
List | pgsql-hackers |
Hi all! The Problem ----------------- One case that traditional SQL doesn't handle very well is when you have a child entity which can be attached to a numberof different parent entities. Examples might be comments, tags or file attachments - we might have 20 different entitiesin the system that we would like our users to be able add comments to, but the existing solutions for mapping thisall have downsides. Existing solution 1: Join tables ahoy If I can have a list of comments on every other object in the system, and I want to have referrential integrity, then theobvious thing to do is create a join table between each entity and the comments table. Pros: - Straight forward, traditional object-with-collection-of-child SQL structure Cons: - If a parent object gets deleted here, we can't use foreign keys to delete e.g. a child comment, so we'll have to eitherexplicitly do it as part of our delete logic or have a cleanup process to catch orphans. Or do a dance with deletetriggers on the join tables deleting the comment. - For n entities requiring comments in the system, we need n join tables.If we want both comments and e.g. tags and likeson all of our entities, we now have 3n join tables for what should be some relatively self-contained on-the-side data- this is could be more tables than the entire rest of the system - It's difficult to create any kind of self-contained component for building applications in this scenario, as it will needto know about every other entity in the system, or be able to poke around inside whatever ORM or data access system thatyou have to work out what join tables it needs when running queries. Existing solution 2: Enter the matrix Instead of having n join tables, let's just mash them all together, with a column per parent object, and a check constraintto force exactly one of those columns to be set. Pros: - Less bloat in the number of tables Cons: - Doesn't solve orphan problem - Addition of a new entity which needs comments and we now have to add another column onto it, potentially rewriting thewhole thing - Ugly Existing solution 3: Embed the matrix Kinda like the dependency matrix table, except that all the columns referencing potential parent objects we put into thecomment table instead. Pros: - Everything contained in column table - No orphans, since cascaded deletes will now delete the actual comment Cons: - Comment table now has references to every single type that it may be attached to - Addition of a new entity and we probably have to rewrite the comment table now Existing solution 4: Abandon ye all referential integrity Have a column indicating parent type and another one for the id. In the case of comments this would be directly on the commenttable itself. In the case of something like tags that we might expect to be shared between entities, it would be ina single join table. Pros: - Pretty self-contained - Data model which has neither lots of empty columns or lots of tables - Can make new entities "commentable" without rewriting anything - Because it's self-contained, can build application components that don't need to know much about the rest of your system.For example this is the approach that the grails taggable and commentable plugins take. Cons: - No referential integrity, since we can't have a single column pointing to different tables with existing foreign key infrastructure - Since there's no real db support for doing things this way, existing ORMs etc don't really know how use a single columnto join against multiple different tables based on a discriminator or 'switch' column. Existing solution 5: Everything's a thing Make your entity hierarchy have a base level object which can have comments attached, and then everything that you need tobe "commentable" has to extend that. You can do that in an ORM, or with table inheritance in the database. Pros: - Single top-level thing to hang your data on Cons: - You've polluted your object hierarchy just to hang some stuff off of the end of it rather than it being driven by behaviours - You're going to be paying a performance penalty - everything that extends that base level object will now join againstit incessantly, and you now have a global id sequence or whatever that you may not want. Basically none of the above handle the situation very well. The cleanest is solution 4, but lack of RI sucks. Feature Proposal: Selective foreign keys. ------------------------------------------------- Allow foreign keys to have where clauses. The above comment example using solution 4 might then look like then following: CREATE TABLE comment as ( id bigserial primary key, content text not null, parent_entity regclass not null, parent_id int8 ); ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) REFERENCES blog(id) WHERE (parent_entity = ‘blog'); ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY (parent_id) REFERENCES event(id) WHERE (parent_entity = ‘event'); At this point, the following things should work: insert into blog(id, title, content) values (10, 'i hate mondays', 'so sad'); insert into event(id, title, location, date) values (20, 'my birthday party', 'local pub', now()); insert into comment(content, parent_entity, parent_id) values ('me too', 'blog', 10); insert into comment(content, parent_entity, parent_id) values ('I love that pub', 'event', 20); But the following would barf: insert into comment(content, parent_entity, parent_id) values ('bad comment', 'blog', 20); ERROR: insert or update on table "comment" violates foreign key constraint "comment_blog_fk" Detail: Key (parent_id)=(20) is not present in table "blog".: Patch ----- I have attached a proof-of-concept patch for this feature. What it currently does: - Allows where clause on FK declaration - Validates and stores the expression into pg_constraint a la check constraints, using the same validity checks - Implements logic on insertion to FK table, and update / delete on PK tables as expected. - Regression tests handling all of the above Things I know are still to be done, if there is consensus that this feature is worth having: - Handle update of columns affecting the selection expression. Currently the RI trigger doesn’t fire if the FK column isn’tmodified. I’m not sure where the logic for this is implemented. - pg_dump support - Update some comments in a few places referring to conbin/consrc columns of pg_constraint as relevant to check constraintsonly - Documentation - There are probably a few scenarios that I’m not testing in the regression tests that I should be. Implementation Comments ---------------------------- I borrowed the exclusion constraint where clause in the grammar, which results in the where expression needing parentheses.With a bit more work this could possibly be removed, but I’m not too upset about it tbh. Currently I’m creating a new executor for each call of RI_FKey_check when there is a where clause, and doing other bits ofwork in there. It’s not 100% clear to me how expensive doing that stuff in there is, and whether it’s worth it or possibleto cache some of it. Stuff added in tablecmds.c was copied from similar stuff elsewhere, but honestly without a huge amount of understanding.Don’t know if I could do without some of it. Comments? Cheers Tom PS Thanks to Noah Misch who answered a question on the list a while ago which got me over a hump, and to the authors of thewiki pages around submitting a patch.
Attachment
pgsql-hackers by date: