Home » Uncategorized » Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

# Distributed Transitive Closure Algorithms, Part 2 – Seminaive and Smart

## Seminaive

The seminaive algorithm is an example of a linear transitive closure algorithm. Initially, the transitive closure contains only the original edges from the graph. Then, at each round, the transitive closure is expanded by combining the pairs found so far with an additional edge from the base graph. In order to compute the transitive closure using the seminaive algorithm, the number of iterations required is equal to one less than the graph’s diameter (From Wikipedia: To find the diameter of a graph, first find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph). For an illustration of this, the figure below demonstrates the sequence of derivations produced by the seminaive algorithm in finding a pair of nodes connected by a path of length 4.

Seminaive: Rounds Required to Derive (a, e)

The pseudocode for the seminaive algorithm is presented below:

Seminaive Pseudocode

$R$ denotes the original graph. $T$ is initialized to $R$, and will contain the transitive closure of $R$ upon termination. $\Delta T$ contains the newly derived tuples from the previous round. At each iteration of the while loop, $\Delta T$ is updated by joining $\Delta T$ with the edges of the original graph, $R$.

On iteration $n$, $\Delta T$ contains all pairs of nodes such that a shortest path between them has length $n+1$. These pairs could not have been discovered in a previous round (by virtue of the shortest path between them having length $n+1$). If any pair produced by the join has already been found, then this implies there exists a shorter path between the two nodes which was found on a previous iteration. The set difference operation removes any such tuples. At the end of iteration $n$, every tuple in $\Delta T$ represents a newly discovered tuple in the transitive closure.

The reasoning behind algorithm’s “seminaive” naming is that it avoids the fully naive join, which would perform the join $T \circ R$ at each iteration. This is enabled by exploiting the knowledge that the only new tuples possible from a join of $T \circ R$ at iteration $n + 1$ come from elements of $T$ that were newly derived at iteration $n$; these newly derived tuples are exactly the contents of $\Delta T$ in seminaive.

## Smart

Smart is an example of a nonlinear transitive closure algorithm. Instead of combining the current knowledge of the transitive closure with the edges of the base graph, each iteration of the smart algorithm combines (in a clever, or “smart”, way) the current transitive closure relation with itself. This enables the Smart algorithm to terminate in a number of rounds logarithmic in the graph’s diameter.

Smart Pseudocode

Smart iteratively builds two relations. At iteration $i$, $Q$ holds all pairs of nodes between which the length of the shortest path is exactly equal to $2^i$. $P$ holds all pairs of nodes between which the shortest path has length less than $2^i$.

In the first round of the computation, $Q$ is simply a self-join of the original edges; in fact, all the transitive closure algorithms discussed here begin with this first step. The result of this join contains all pairs of nodes connected by a path of length 2, but there may be nodes in this result that are connected by a shorter path (a direct edge) so the set difference operation in line 6 is required to remove any pairs of nodes where there exists a path between them of length less than 2. In the next iteration, $Q$ is again be joined with itself, producing (after the set difference) all pairs of nodes connected by a shortest path of exactly length 4.

In round $i$, $P$ is formed from the join of $Q$ and $P$. This combines paths from $Q$ (of length exactly $2^{i-1}$) with paths from $P$ (of length strictly less than $2^{i-1}$). This will find all shortest paths of any length between $2^{i-1}$ and $2^i - 1$. To also incorporate previously discovered paths of lesser length, line 5 takes the union of $Q \circ P$ with the tuples of $Q$ and $P$. Therefore, at the end of round $i$, $P$ contains all paths of length less than $2^{i}$.

Not shown in the pseudocode is a duplicate elimination step after line 5, to remove redundant tuples produced by the join and union operations that form $P$. Duplicate elimination is required on the $Q$ relation for termination detection in the presence of cycles, and is implicit in the set difference operation of line 6. However, duplicate elimination on $P$ is not required to detect termination, but it is necessary for efficient performance on graphs where Smart will produce duplicate derivations.