interval origami - Mailing list pgsql-sql
From | Adam Jensen |
---|---|
Subject | interval origami |
Date | |
Msg-id | a4b97a40-6c5a-580f-361d-03de6572ce27@riseup.net Whole thread Raw |
Responses |
Re: interval origami
|
List | pgsql-sql |
Hi, I am working on a hobby project that might eventually make its way into the public domain. The goal is to create a Media Annotation, Analysis and Playback Synthesis System (I call the project MAAPSS). Conceptually, it is somewhat similar to vcode[1,2] but more "hackable" with broader general applications. [1]: http://social.cs.uiuc.edu/projects/vcode.html [2]: https://youtu.be/Sv_OZ174wpg In the configuration that I am currently exploring, the database will contain many entries with intervals that could overlap in every possible way. The entries could be labeled in many ways other than "interesting" and "fail". This little example is a simplified attempt to isolate what I see as the core problem, which is probably mostly based in my current lack of understanding of SQL. CREATE TABLE Example (start REAL, stop REAL, tag TEXT); INSERT INTO Example VALUES (10, 30, 'interesting'), (12, 32, 'interesting'), (15, 20, 'fail'), (16, 39, 'interesting'), (17, 21, 'fail'), (50, 80, 'interesting'), (60, 65, 'fail'), (85, 90, 'fail'); The 'start' and 'stop' values stored in the database are numbers of type REAL, representing seconds. There is no need for calculations with, or transformations between, elaborate time representation formats. Here is a cognitive approach to a solution: 0. Starting with a dataset like this: 10.0|30.0|interesting 12.0|32.0|interesting 15.0|20.0|fail 16.0|39.0|interesting 17.0|21.0|fail 50.0|80.0|interesting 60.0|65.0|fail 85.0|90.0|fail 1. Find all of the "interesting" segments. 10.0|30.0|interesting 12.0|32.0|interesting 16.0|39.0|interesting 50.0|80.0|interesting 2. Find any "interesting" segments with overlaps and merge them. 10.0|39.0|interesting 50.0|80.0|interesting 3. Find all "fail" segments with a period that overlaps the period of any (step 2) "interesting" segment. 15.0|20.0|fail 17.0|21.0|fail 60.0|65.0|fail 4. Find any overlaps in those (step 3) "fail" segments and merge. 15.0|21.0|fail 60.0|65.0|fail 5. Derive a new list of "interesting" segments from (step 2) that omits all "fail" segments from (step 4). 10.0|15.0|interesting 21.0|39.0|interesting 50.0|60.0|interesting 65.0|80.0|interesting A relational database solution is needed. I am not yet very familiar with relational database design and SQL beyond the very basics. So any pointers to similar SQL problems with solutions and explanations would be tremendously useful at this point. Any hints at all, really, will be much appreciated. Beyond a basic solution, I suspect something like an R-Tree index might be useful. I gather this from the SQLite page on R-Trees: https://sqlite.org/rtree.html -- SELECT version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit (1 row)