Transitive closure of a directed graph - Mailing list pgsql-hackers
From | Steinar H. Gunderson |
---|---|
Subject | Transitive closure of a directed graph |
Date | |
Msg-id | 20051102173156.GA19366@uio.no Whole thread Raw |
Responses |
Re: Transitive closure of a directed graph
|
List | pgsql-hackers |
Hi, I was asked to post this here for any interested parties -- please Cc me on any comments/followups as I'm not subscribed to -hackers. This is a sort-of writeup on how to find the transitive closure[1] of a directed graph G(V,E), or G+(V,E) if you want to get into notation. (There are multiple ways to do this, though, and this only describes one of them. I've been told it could be possible to do this with WITH RECURSIVE, but I've never used it myself and PostgreSQL doesn't support it, so it's left as an excercise for the reader :-) ) In short, the transitive closure of a graph is defined as follows: The transitive closure G+(V,E) contains an edge A -> B if and only If there is a path from A to B (possibly via other vertices) in the graph G(V,E). In other words, one question the transitive closure can answer quickly (to other queries) is "is there a path from A to B?". (We use this in our door card access system, which supports group inheritance -- a group can be a subgroup of another group and thus inherits all its door access.) A relational database is unable to store and operate on a graph directly, but you can store the vertices and edges separately, as in: CREATE TABLE vertices ( id INTEGER PRIMARY KEY <any other fields you'd want here> ); CREATE TABLE edges ( upper INTEGERREFERENCES vertices(id), lower INTEGER REFERENCES vertices(id) ); We'll be using PL/PgSQL, which unfortunately can't handle temporary tables very well at the moment, so we'll also define a temporary copy we can work on from our function, identical to edges: CREATE TABLE temp ( upper INTEGER, lower INTEGER ); The basic idea of the technique is to grow the closure outwards from the original graph; this makes it somewhat like the Bellman-Ford algorithm implemented in SQL. It is not particularily effective on dense graphs (since it generates potentially V^2 records in the temporary table) or in graphs with very long paths (since it needs one iteration for every step of the longest path), but works well on sparse graphs without too long paths. It's also very simple, and tries to leave as much as possible of the work to PostgreSQL itself. (I implemented a variant using depth-first search and simulated queues once, and it was 10-20 times slower, possibly due to PL/PgSQL overhead -- an implementation in C or Perl might fare a lot better.) So, for every iteration we basically do this: For every edge (A,B) in our table, find all edges (B,C) that we don't already have found. Since there's a path between A and C, insert (A,C) as an edge. Repeat until there are no more edges inserted, and we have our transitive closure. The function goes as follows: CREATE FUNCTION transitive_closure() RETURNS SETOF edges AS ' DECLARE r record; BEGIN -- Start with the initial set of edges as a base INSERT INTO temp SELECT * FROM edges; -- Grow the graph until nothing more happens LOOP INSERT INTO temp SELECT e1.upper,e2.lower FROM temp e1 JOIN temp e2 ON e1.lower=e2.upper WHERE (e1.upper, e2.lower)NOT IN ( SELECT * FROM temp ); EXIT WHEN NOT FOUND; END LOOP; -- Return the entire data set FOR r in SELECT * FROM temp LOOP RETURN NEXT r; ENDLOOP; -- Clean up TRUNCATE temp; RETURN; END; ' LANGUAGE plpgsql; Also note that after this, if there's any edge (A,A) in the graph, there is a loop somewhere, and the original graph is not a DAG. A simple example, just for reference: graph=# select * from vertices; id ---- 1 2 3 4 5 (5 rows) graph=# select * from edges; upper | lower -------+------- 1 | 2 2 | 3 2 | 4 1 | 5 5 | 4 (5 rows) The matching graph in ASCII art looks something like (I haven't drawn the arrows): 1 / \ 2 5 / \ / 3 4 The matching transitive closure then becomes: mdb2=# select * from transitive_closure(); upper | lower -------+------- 1 | 2 2 | 3 2 | 4 1 | 5 5 | 4 1 | 3 1 | 4 1 | 4 (8 rows) Or, again in ASCII (a bit harder this time :-) ) --- 1 / / | \ / 2 | 5 |/ \ | / 3 4 Note that there's a duplicate in there (because both the path 1 -> 2 -> 4 and 1 -> 5 -> 4 are added in the same iteration and thus produce the edge 1 -> 4); you might want to add a DISTINCT (either at the end or in each iteration) if that's a problem. /* Steinar */ [1] http://en.wikipedia.org/wiki/Transitive_closure -- Homepage: http://www.sesse.net/
pgsql-hackers by date: