Thread: Filtering by tags
What's the recommended way of storing "tags" in a database, and then filtering based on the existence, or *non*-existence, of those tags on some entities? Our application stores contacts, where each contact may have any number of tags. We do this with the tables contacts, contacts_tags and tags. We also have segments, which defines "filters" on contacts based on specific tags they must have and/or must *not* have. This is defined by the tables segments and segments_tags. (See bottom of post for table definitions). Finding contacts matching a given segment which has BOTH positive (required tags) and negative (non-existing tags) requirements is easy enough (simplified): SELECT segmentid, email FROM segments_tags st INNER JOIN contacts_tags ct USING (tagname) INNER JOIN contacts USING (email) WHERE st.tagtype = 1 GROUP BY 1, 2 HAVING COUNT(*) = (SELECT COUNT(*) FROM segments_tags WHERE segmentid = st.segmentid AND tagtype = 1) EXCEPT SELECT segmentid, email FROM segments_tags st INNER JOIN contacts_tags ct USING (tagname) INNER JOIN contacts USING (email) WHERE st.tagtype = 0; However, segments which ONLY contain negative requirements (that's "tagtype" = 0) doesn't work, for obvious reasons. Is there a way to make this work with a single query for both cases? Possibly using CTE (which I'm not very familiar with)? Table definitions: Table "public.contacts" Column | Type | Modifiers ---------------+-----------------------------+----------------- email | email | not null name | text | status | character(1) | not null default 'a'::bpchar statuschanged | timestamp without time zone | Indexes: "contacts_pkey" PRIMARY KEY, btree (email) Table "public.contacts_tags" Column | Type | Modifiers ---------+-------+----------- email | email | not null tagname | text | not null Indexes: "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname) "contacts_tags_tagname" btree (tagname) Foreign-key constraints: "contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE Table "public.tags" Column | Type | Modifiers -----------+-----------------------------+----------------------- tagname | text | not null createdat | timestamp without time zone | not null default now() Indexes: "tags_pkey" PRIMARY KEY, btree (tagname) Table "public.segments" Column | Type | Modifiers -------------+-----------------------------+--------------------- segmentid | integer | not null default nextval('segments_segmentid_seq'::regclass) segmentname| text | not null createdat | timestamp without time zone | not null default now() Indexes: "segments_pkey" PRIMARY KEY, btree (segmentid) Table "public.segments_tags" Column | Type | Modifiers -----------+---------+---------- segmentid | integer | not null tagname | text | not null tagtype | integer | not null Indexes: "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname) Foreign-key constraints: "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE Regards, -- a.
No one with any response on this? -- a. Anders Steinlein wrote: > What's the recommended way of storing "tags" in a database, and then > filtering based on the existence, or *non*-existence, of those tags on > some entities? > > Our application stores contacts, where each contact may have any number > of tags. We do this with the tables contacts, contacts_tags and tags. We > also have segments, which defines "filters" on contacts based on > specific tags they must have and/or must *not* have. This is defined by > the tables segments and segments_tags. (See bottom of post for table > definitions). > > Finding contacts matching a given segment which has BOTH positive > (required tags) and negative (non-existing tags) requirements is easy > enough (simplified): > > SELECT segmentid, email > FROM segments_tags st > INNER JOIN contacts_tags ct USING (tagname) > INNER JOIN contacts USING (email) > WHERE st.tagtype = 1 > GROUP BY 1, 2 > HAVING COUNT(*) > = (SELECT COUNT(*) FROM segments_tags > WHERE segmentid = st.segmentid AND tagtype = 1) > EXCEPT > SELECT segmentid, email > FROM segments_tags st > INNER JOIN contacts_tags ct USING (tagname) > INNER JOIN contacts USING (email) > WHERE st.tagtype = 0; > > However, segments which ONLY contain negative requirements (that's > "tagtype" = 0) doesn't work, for obvious reasons. > > Is there a way to make this work with a single query for both cases? > Possibly using CTE (which I'm not very familiar with)? > > Table definitions: > Table "public.contacts" > Column | Type | Modifiers > ---------------+-----------------------------+----------------- > email | email | not null > name | text | > status | character(1) | not null default 'a'::bpchar > statuschanged | timestamp without time zone | > Indexes: > "contacts_pkey" PRIMARY KEY, btree (email) > > Table "public.contacts_tags" > Column | Type | Modifiers > ---------+-------+----------- > email | email | not null > tagname | text | not null > Indexes: > "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname) > "contacts_tags_tagname" btree (tagname) > Foreign-key constraints: > "contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES > contacts(email) ON UPDATE CASCADE ON DELETE CASCADE > "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES > tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE > > Table "public.tags" > Column | Type | Modifiers > -----------+-----------------------------+----------------------- > tagname | text | not null > createdat | timestamp without time zone | not null default now() > Indexes: > "tags_pkey" PRIMARY KEY, btree (tagname) > > Table "public.segments" > Column | Type | Modifiers > -------------+-----------------------------+--------------------- > segmentid | integer | not null default > nextval('segments_segmentid_seq'::regclass) > segmentname| text | not null > createdat | timestamp without time zone | not null default now() > Indexes: > "segments_pkey" PRIMARY KEY, btree (segmentid) > > Table "public.segments_tags" > Column | Type | Modifiers > -----------+---------+---------- > segmentid | integer | not null > tagname | text | not null > tagtype | integer | not null > Indexes: > "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname) > Foreign-key constraints: > "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES > segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE > "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES > tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE > > > Regards, > -- a. >
Anders Steinlein <anders@steinlein.no> wrote: > No one with any response on this? > [...] Insert a "LEFT JOIN" in the first subquery? Tim (too lazy to test :-))
On Wed, Jun 30, 2010 at 05:54:51PM +0200, Anders Steinlein wrote: > No one with any response on this? Fun problem, how about: SELECT x.email, x.segmentid FROM ( SELECT c.email, t.segmentid, t.tagname, t.tagtype FROM contacts c, segments_tags t) x LEFT JOIN contacts_tags t USING (email,tagname) GROUP BY x.email, x.segmentid HAVING NOT bool_or((x.tagtype = 0) <> (t.tagname IS NULL)); The HAVING statement is a little obscure, but could also be written: HAVING COUNT(CASE WHEN x.tagtype = 0 AND t.tagname IS NULL THEN 1 END) = 0 AND COUNT(CASE WHEN x.tagtype = 1 AND t.tagname IS NOT NULL THEN 1 END) = 0; it works by keeping count of the number of "bad" tags; i.e. if the tag type is zero then expect the tag entry not to be found, and the reverse if the tag type is one. Because of the cross join in the inner select this is going to be *slow*, so you may want to limit things a bit by only working with one contact or segment type at a time. Hope that gives you a few ideas! -- Sam http://samason.me.uk/