Re: Determining logically unique entities across many partially complete rows where at least one column matches - Mailing list pgsql-sql
From | Rob Sargent |
---|---|
Subject | Re: Determining logically unique entities across many partially complete rows where at least one column matches |
Date | |
Msg-id | 4A819103.8040102@gmail.com Whole thread Raw |
In response to | Determining logically unique entities across many partially complete rows where at least one column matches (Jamie Tufnell <diesql@googlemail.com>) |
List | pgsql-sql |
Seems to me that if you can safely identify which snippets correspond to a given entity you want a single id for the entity. An entity-snippet relationship seems a must. I would not lean too heavily on a single table solution unless you're considering arrays for openid,email and phone. (And given the one-to-many-real-people on phone I would be leery of this "identifier".) Jamie Tufnell wrote: > Hi, > > I am faced with a modeling problem and thought I'd see if anyone has run > into something similar and can offer some advice. > > Basically my problem domain is cataloguing "snippets of information" about > "entities" which are loosely identified. > > Entities can be identified up to 3 different methods (email, phone or > openid.) > > Entities can have zero or many emails, phone numbers and openids. The > only restriction is they must have at least one value in one of those three > columns. > > > Some sample data: > > snippet #1 > email: null > phone: +1234567890 > openid: john@myopenid.net > information: This is snippet #1 > > snippet #2 > email: foo@bar.com > phone: null > openid: johnny.name > information: This is snippet #2 > > At this point snippet #1 and #2 could refer to different entities. > > snippet #3 > email: bar@baz.com > phone: +1234567890 > openid: johnny.name > information: This is snippet #3 > > But now all three snippets definitely refer to the same entity, as far as > we're concerned: > > Entity: 1 > OpenIDs: johnny.name, john@myopenid.net > Phones: +1234567890 > Emails: foo@bar.com, bar@baz.com > > So as far as modeling this goes, I'm stuck between: > > 1. Normalizing as usual with some serious triggers to maintain the > relationships. > 2. Just having a snippets table with these fields inline and make these > inferences at query time. > 3. Something in between. > 4. Using a document store like CouchDB. > > The kinds of queries I need to do right now (which will no doubt change): > > * Return all snippets. > * Return all distinct entities. > * Find all id for a distinct entity given a single piece of id. > * Find all snippets for a distinct entity. > > To do it in one table, I am thinking something like this: > > create table snippets ( > id serial not null primary key, > email text, > phone_number text, > openid text, > information text not null, > check (email is not null or > phone_number is not null or openid is not null) > ); > > with queries like: > > * Find all snippets for one distinct entity, searching by openid: > > select * from snippets > where phone_number = > (select phone_number from snippets where openid = 'john@myopenid.net') > or email = > (select email from snippets where openid = 'john@myopenid.net') > or openid in > (select openid from snippets > where phone_number = > (select phone_number from snippets where openid = 'john@myopenid.net') > or email = > (select email from snippets where openid = 'john@myopenid.net')); > > > Or if I was to model as usual I am thinking something like this: > > create table entities ( > id serial not null primary key > ); > > create table entity_has_email ( > entity_id integer not null > references entities (id) on delete cascade on update cascade, > email text not null unique > ); > > create table entity_has_phone_number ( > entity_id integer not null > references entities (id) on delete cascade on update cascade, > phone_number text not null unique > ); > > create table entity_has_openid ( > entity_id integer not null > references entities (id) on delete cascade on update cascade, > openid text not null unique > ); > > create table snippets ( > id serial not null primary key, > entity_id integer not null > references entities (id) on delete cascade on update cascade, > information text not null > ); > > (followed by a mass of on insert/update/delete triggers) > > with queries like: > > * Find all snippets for a distinct entity, by one identifying field: > > select s.* from snippets s > join entity_has_email e on s.entity_id = e.id > join entity_has_phone_number p on s.entity_id = p.id > join entity_has_openid o on s.entity_id = o.id > where o.openid = 'john@myopenid.net'; > > Another option, sort of half way between the two could be: > > create table snippets ( > id serial not null primary key, > entity_id integer not null > references entities (id) on delete cascade on update cascade, > information text not null > ); > > create table entities ( > id serial not null primary key, > email text, > phone_number text, > openid text, > check (email is not null or > phone_number is not null or openid is not null) > ); > > * Find all snippets for a distinct entity, by openid = 'john@myopenid.net' > > select * from snippets > where entity_id in ( > select id from entities > where phone_number = > (select phone_number from entities where openid = 'john@myopenid.net') > or email = > (select email from entities where openid = 'john@myopenid.net') > or openid in > (select openid from entities > where phone_number = > (select phone_number from entities where openid = 'john@myopenid.net' > or email = > (select email from entities where openid = 'john@myopenid.net') )); > > At this point I am leaning towards that last method. > > Has anyone had to model something similar? Did you use one of these > methods or something else? > > Any/all comments appreciated! > > J > >