Thread: Filtering by tags

Filtering by tags

From

Anders Steinlein

Date:

24 June 2010, 15:02:03

What's the recommended way of storing "tags" in a database, and then
filtering based on the existence, or *non*-existence, of those tags on
some entities?

Our application stores contacts, where each contact may have any number
of tags. We do this with the tables contacts, contacts_tags and tags. We
also have segments, which defines "filters" on contacts based on
specific tags they must have and/or must *not* have. This is defined by
the tables segments and segments_tags. (See bottom of post for table
definitions).

Finding contacts matching a given segment which has BOTH positive
(required tags) and negative (non-existing tags) requirements is easy
enough (simplified):

SELECT segmentid, email
     FROM segments_tags st
         INNER JOIN contacts_tags ct USING (tagname)
             INNER JOIN contacts USING (email)
     WHERE st.tagtype = 1
     GROUP BY 1, 2
     HAVING COUNT(*)
         = (SELECT COUNT(*) FROM segments_tags
             WHERE segmentid = st.segmentid AND tagtype = 1)
EXCEPT
SELECT segmentid, email
     FROM segments_tags st
         INNER JOIN contacts_tags ct USING (tagname)
             INNER JOIN contacts USING (email)
     WHERE st.tagtype = 0;

However, segments which ONLY contain negative requirements (that's
"tagtype" = 0) doesn't work, for obvious reasons.

Is there a way to make this work with a single query for both cases?
Possibly using CTE (which I'm not very familiar with)?

Table definitions:
                           Table "public.contacts"
     Column     |            Type             |          Modifiers

---------------+-----------------------------+-----------------
  email         | email                       | not null
  name          | text                        |
  status        | character(1)                | not null default 'a'::bpchar
  statuschanged | timestamp without time zone |
Indexes:
     "contacts_pkey" PRIMARY KEY, btree (email)

Table "public.contacts_tags"
  Column  | Type  | Modifiers
---------+-------+-----------
  email   | email | not null
  tagname | text  | not null
Indexes:
     "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname)
     "contacts_tags_tagname" btree (tagname)
Foreign-key constraints:
     "contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES
contacts(email) ON UPDATE CASCADE ON DELETE CASCADE
     "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES
tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE

                   Table "public.tags"
   Column   |            Type             |       Modifiers
-----------+-----------------------------+-----------------------
  tagname   | text                        | not null
  createdat | timestamp without time zone | not null default now()
Indexes:
     "tags_pkey" PRIMARY KEY, btree (tagname)

                 Table "public.segments"
    Column  |            Type             |   Modifiers
-------------+-----------------------------+---------------------
segmentid  | integer                     | not null default
nextval('segments_segmentid_seq'::regclass)
segmentname| text                        | not null
createdat  | timestamp without time zone | not null default now()
Indexes:
     "segments_pkey" PRIMARY KEY, btree (segmentid)

   Table "public.segments_tags"
   Column   |  Type   | Modifiers
-----------+---------+----------
  segmentid | integer | not null
  tagname   | text    | not null
  tagtype   | integer | not null
Indexes:
     "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname)
Foreign-key constraints:
     "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES
segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE
     "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES
tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE


Regards,
-- a.

Re: Filtering by tags

From

Anders Steinlein

Date:

30 June 2010, 12:55:05

No one with any response on this?

-- a.


Anders Steinlein wrote:
> What's the recommended way of storing "tags" in a database, and then
> filtering based on the existence, or *non*-existence, of those tags on
> some entities?
>
> Our application stores contacts, where each contact may have any number
> of tags. We do this with the tables contacts, contacts_tags and tags. We
> also have segments, which defines "filters" on contacts based on
> specific tags they must have and/or must *not* have. This is defined by
> the tables segments and segments_tags. (See bottom of post for table
> definitions).
>
> Finding contacts matching a given segment which has BOTH positive
> (required tags) and negative (non-existing tags) requirements is easy
> enough (simplified):
>
> SELECT segmentid, email
> FROM segments_tags st
> INNER JOIN contacts_tags ct USING (tagname)
> INNER JOIN contacts USING (email)
> WHERE st.tagtype = 1
> GROUP BY 1, 2
> HAVING COUNT(*)
> = (SELECT COUNT(*) FROM segments_tags
> WHERE segmentid = st.segmentid AND tagtype = 1)
> EXCEPT
> SELECT segmentid, email
> FROM segments_tags st
> INNER JOIN contacts_tags ct USING (tagname)
> INNER JOIN contacts USING (email)
> WHERE st.tagtype = 0;
>
> However, segments which ONLY contain negative requirements (that's
> "tagtype" = 0) doesn't work, for obvious reasons.
>
> Is there a way to make this work with a single query for both cases?
> Possibly using CTE (which I'm not very familiar with)?
>
> Table definitions:
> Table "public.contacts"
> Column | Type | Modifiers
> ---------------+-----------------------------+-----------------
> email | email | not null
> name | text |
> status | character(1) | not null default 'a'::bpchar
> statuschanged | timestamp without time zone |
> Indexes:
> "contacts_pkey" PRIMARY KEY, btree (email)
>
> Table "public.contacts_tags"
> Column | Type | Modifiers
> ---------+-------+-----------
> email | email | not null
> tagname | text | not null
> Indexes:
> "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname)
> "contacts_tags_tagname" btree (tagname)
> Foreign-key constraints:
> "contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES
> contacts(email) ON UPDATE CASCADE ON DELETE CASCADE
> "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES
> tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE
>
> Table "public.tags"
> Column | Type | Modifiers
> -----------+-----------------------------+-----------------------
> tagname | text | not null
> createdat | timestamp without time zone | not null default now()
> Indexes:
> "tags_pkey" PRIMARY KEY, btree (tagname)
>
> Table "public.segments"
> Column | Type | Modifiers
> -------------+-----------------------------+---------------------
> segmentid | integer | not null default
> nextval('segments_segmentid_seq'::regclass)
> segmentname| text | not null
> createdat | timestamp without time zone | not null default now()
> Indexes:
> "segments_pkey" PRIMARY KEY, btree (segmentid)
>
> Table "public.segments_tags"
> Column | Type | Modifiers
> -----------+---------+----------
> segmentid | integer | not null
> tagname | text | not null
> tagtype | integer | not null
> Indexes:
> "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname)
> Foreign-key constraints:
> "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES
> segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE
> "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES
> tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE
>
>
> Regards,
> -- a.
>

Re: Filtering by tags

From

Tim Landscheidt

Date:

30 June 2010, 14:18:30

Anders Steinlein <anders@steinlein.no> wrote:

> No one with any response on this?
> [...]

Insert a "LEFT JOIN" in the first subquery?

Tim
(too lazy to test :-))

Re: Filtering by tags

From

Sam Mason

Date:

30 June 2010, 14:58:40

On Wed, Jun 30, 2010 at 05:54:51PM +0200, Anders Steinlein wrote:
> No one with any response on this?

Fun problem, how about:

  SELECT x.email, x.segmentid
  FROM (
      SELECT c.email, t.segmentid, t.tagname, t.tagtype
      FROM contacts c, segments_tags t) x
    LEFT JOIN contacts_tags t USING (email,tagname)
  GROUP BY x.email, x.segmentid
  HAVING NOT bool_or((x.tagtype = 0) <> (t.tagname IS NULL));

The HAVING statement is a little obscure, but could also be written:

  HAVING COUNT(CASE WHEN x.tagtype = 0 AND t.tagname IS     NULL THEN 1 END) = 0
     AND COUNT(CASE WHEN x.tagtype = 1 AND t.tagname IS NOT NULL THEN 1 END) = 0;

it works by keeping count of the number of "bad" tags; i.e. if the tag
type is zero then expect the tag entry not to be found, and the reverse
if the tag type is one.

Because of the cross join in the inner select this is going to be
*slow*, so you may want to limit things a bit by only working with one
contact or segment type at a time.

Hope that gives you a few ideas!

--
  Sam  http://samason.me.uk/