Home » Uncategorized » Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

Seminaive

The seminaive algorithm is an example of a linear transitive closure algorithm. Initially, the transitive closure contains only the original edges from the graph. Then, at each round, the transitive closure is expanded by combining the pairs found so far with an additional edge from the base graph. In order to compute the transitive closure using the seminaive algorithm, the number of iterations required is equal to one less than the graph’s diameter (From Wikipedia: To find the diameter of a graph, first find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph). For an illustration of this, the figure below demonstrates the sequence of derivations produced by the seminaive algorithm in finding a pair of nodes connected by a path of length 4.

Seminaive: Rounds Required to Derive (a, e)

The pseudocode for the seminaive algorithm is presented below:

Seminaive Pseudocode

$R$ denotes the original graph. $T$ is initialized to $R$ , and will contain the transitive closure of $R$ upon termination. $\Delta T$ contains the newly derived tuples from the previous round. At each iteration of the while loop, $\Delta T$ is updated by joining $\Delta T$ with the edges of the original graph, $R$ .

On iteration $n$ , $\Delta T$ contains all pairs of nodes such that a shortest path between them has length $n+1$ . These pairs could not have been discovered in a previous round (by virtue of the shortest path between them having length $n+1$ ). If any pair produced by the join has already been found, then this implies there exists a shorter path between the two nodes which was found on a previous iteration. The set difference operation removes any such tuples. At the end of iteration $n$ , every tuple in $\Delta T$ represents a newly discovered tuple in the transitive closure.

The reasoning behind algorithm’s “seminaive” naming is that it avoids the fully naive join, which would perform the join $T \circ R$ at each iteration. This is enabled by exploiting the knowledge that the only new tuples possible from a join of $T \circ R$ at iteration $n + 1$ come from elements of $T$ that were newly derived at iteration $n$ ; these newly derived tuples are exactly the contents of $\Delta T$ in seminaive.

Smart

Smart is an example of a nonlinear transitive closure algorithm. Instead of combining the current knowledge of the transitive closure with the edges of the base graph, each iteration of the smart algorithm combines (in a clever, or “smart”, way) the current transitive closure relation with itself. This enables the Smart algorithm to terminate in a number of rounds logarithmic in the graph’s diameter.

Smart Pseudocode

Smart iteratively builds two relations. At iteration $i$ , $Q$ holds all pairs of nodes between which the length of the shortest path is exactly equal to $2^i$ . $P$ holds all pairs of nodes between which the shortest path has length less than $2^i$ .

In the first round of the computation, $Q$ is simply a self-join of the original edges; in fact, all the transitive closure algorithms discussed here begin with this first step. The result of this join contains all pairs of nodes connected by a path of length 2, but there may be nodes in this result that are connected by a shorter path (a direct edge) so the set difference operation in line 6 is required to remove any pairs of nodes where there exists a path between them of length less than 2. In the next iteration, $Q$ is again be joined with itself, producing (after the set difference) all pairs of nodes connected by a shortest path of exactly length 4.

In round $i$ , $P$ is formed from the join of $Q$ and $P$ . This combines paths from $Q$ (of length exactly $2^{i-1}$ ) with paths from $P$ (of length strictly less than $2^{i-1}$ ). This will find all shortest paths of any length between $2^{i-1}$ and $2^i - 1$ . To also incorporate previously discovered paths of lesser length, line 5 takes the union of $Q \circ P$ with the tuples of $Q$ and $P$ . Therefore, at the end of round $i$ , $P$ contains all paths of length less than $2^{i}$ .

Not shown in the pseudocode is a duplicate elimination step after line 5, to remove redundant tuples produced by the join and union operations that form $P$ . Duplicate elimination is required on the $Q$ relation for termination detection in the presence of cycles, and is implicit in the set difference operation of line 6. However, duplicate elimination on $P$ is not required to detect termination, but it is necessary for efficient performance on graphs where Smart will produce duplicate derivations.

By Eric Gribkoff in Uncategorized on July 29, 2013.

2 Comments

Saerorm Park says:

July 29, 2013 at 3:16 pm

Figures look familiar! 🙂 Good post, nice work!

Reply
Distributed Transitive Closure Algorithms, Part 4 – Java Implementations « A Single Neuron says:

October 21, 2013 at 3:26 pm

[…] Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart […]

Reply

Latest Posts

The Hardness of Probabilistic Queries
Since I started at UW last quarter, I have been spending a lot of time working on hardness results for queries over probabilistic databases. A…
Entity Classification and Relation Extraction: Joint Probabilistic Models
The goal of relation extraction is to automatically build structured, queryable representations of unstructured text. The sheer volume of available data prohibits manual extraction, yet…
Relax, Compensate, and then Recover: Notes on the paper by Arthur Choi and Adnan Darwiche
I’ve been reading a lot of papers since starting at UW and I thought it might be nice to keep some written notes about a…
Cloud Recognition in the Miami Skyline
For our senior design project a UC Davis, Saerorm Park, Liz Shigetoshi, Ashish Subedy and I worked on automatically classifying cloud types in webcam images…

A Single Neuron

Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

Recent Posts

Archives

Categories

Meta

Seminaive

Smart

2 Comments

Leave a reply to Distributed Transitive Closure Algorithms, Part 4 – Java Implementations « A Single Neuron Cancel reply

Latest Posts

The Hardness of Probabilistic Queries

Entity Classification and Relation Extraction: Joint Probabilistic Models

Relax, Compensate, and then Recover: Notes on the paper by Arthur Choi and Adnan Darwiche

Cloud Recognition in the Miami Skyline

A Single Neuron

Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

Recent Posts

Archives

Categories

Meta

Seminaive

Smart

Share this:

Related

2 Comments

Leave a reply to Distributed Transitive Closure Algorithms, Part 4 – Java Implementations « A Single Neuron Cancel reply

Latest Posts

The Hardness of Probabilistic Queries

Entity Classification and Relation Extraction: Joint Probabilistic Models

Relax, Compensate, and then Recover: Notes on the paper by Arthur Choi and Adnan Darwiche

Cloud Recognition in the Miami Skyline