Home > mailing lists

interval origami - Mailing list pgsql-sql

From	Adam Jensen
Subject	interval origami
Date	November 30, 2018 17:52:31
Msg-id	a4b97a40-6c5a-580f-361d-03de6572ce27@riseup.net Whole thread Raw
Responses	Re: interval origami
List	pgsql-sql

Tree view

Hi,

I am working on a hobby project that might eventually make its way into
the public domain. The goal is to create a Media Annotation, Analysis
and Playback Synthesis System (I call the project MAAPSS). Conceptually,
it is somewhat similar to vcode[1,2] but more "hackable" with broader
general applications.

[1]: http://social.cs.uiuc.edu/projects/vcode.html
[2]: https://youtu.be/Sv_OZ174wpg

In the configuration that I am currently exploring, the database will
contain many entries with intervals that could overlap in every possible
way. The entries could be labeled in many ways other than "interesting"
and "fail". This little example is a simplified attempt to isolate what
I see as the core problem, which is probably mostly based in my current
lack of understanding of SQL.

CREATE TABLE Example (start REAL, stop REAL, tag TEXT);
INSERT INTO Example VALUES
(10, 30, 'interesting'),
(12, 32, 'interesting'),
(15, 20, 'fail'),
(16, 39, 'interesting'),
(17, 21, 'fail'),
(50, 80, 'interesting'),
(60, 65, 'fail'),
(85, 90, 'fail');

The 'start' and 'stop' values stored in the database are numbers of type
 REAL, representing seconds. There is no need for calculations with, or
transformations between, elaborate time representation formats.

Here is a cognitive approach to a solution:

0. Starting with a dataset like this:
10.0|30.0|interesting
12.0|32.0|interesting
15.0|20.0|fail
16.0|39.0|interesting
17.0|21.0|fail
50.0|80.0|interesting
60.0|65.0|fail
85.0|90.0|fail

1. Find all of the "interesting" segments.
10.0|30.0|interesting
12.0|32.0|interesting
16.0|39.0|interesting
50.0|80.0|interesting

2. Find any "interesting" segments with overlaps and merge them.
10.0|39.0|interesting
50.0|80.0|interesting

3. Find all "fail" segments with a period that overlaps the period of any
(step 2) "interesting" segment.
15.0|20.0|fail
17.0|21.0|fail
60.0|65.0|fail

4. Find any overlaps in those (step 3) "fail" segments and merge.
15.0|21.0|fail
60.0|65.0|fail

5. Derive a new list of "interesting" segments from (step 2) that omits
all "fail" segments from (step 4).
10.0|15.0|interesting
21.0|39.0|interesting
50.0|60.0|interesting
65.0|80.0|interesting

A relational database solution is needed. I am not yet very familiar
with relational database design and SQL beyond the very basics. So any
pointers to similar SQL problems with solutions and explanations would
be tremendously useful at this point. Any hints at all, really, will be
much appreciated.

Beyond a basic solution, I suspect something like an R-Tree index might
be useful. I gather this from the SQLite page on R-Trees:

https://sqlite.org/rtree.html

--

SELECT version();
                                                 version

---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

pgsql-sql by date:

From: "Voillequin, Jean-Marc"
Date: 30 November 2018, 16:25:02
Subject: RE: Avoid "could not determine interpretation of row comparisonoperator ="

From: Alvaro Herrera
Date: 30 November 2018, 17:57:19
Subject: Re: interval origami

interval origami - Mailing list pgsql-sql

Previous

Next