Topological Sorting

Required Knowledge

You should know what a directed graph is.

Introduction

This post is part of a series on algorithms related to topological sorting. In this particular post I want to give a short introduction to the concept of a topological sort. Additionally I will present Kahn's algorithm, which topologically sorts a given directed graph.

The Basics

A topological sort of a directed acyclic graph $D=(V,A)$ is a linear ordering of the vertices, s.t. for any edge $(u, v) \in A$, the node $u$ needs to appear before $v$ in the ordering.

Not all directed graphs can be topologically sorted.

A directed graph $G$ is acyclic if and only if it can be topologically sorted.

Therefore algorithms for topological sorting can often be used to check if a graph contains cycles.

There are two well-known algorithms to find a topological sort of a graph. One is based on depth-first search (DFS) and another one is based on node degrees in the graph. The algorithm based on node degrees was developed by Kahn2.

Kahn's Algorithm

Only nodes that have no predecessor can be in the first position of the ordering. Thus you need to start in a node that has no incoming edges. After choosing an initial node, we can remove it from the graph and start the search for another node without incoming edges.

This can easily be achieved by maintaining a map of the node degrees.

def kahns_algo(graph):
"""Calculate a topological sort of a given directed graph using
Kahn's algorithm .

graph : (DirectedGraph)
a directed graph

raises : Exception
if the graph contains a cycle

References
----------
 Kahn, Arthur B. "Topological sorting of large networks."
Communications of the ACM 5.11 (1962): 558-562.
"""
# initialize degree map
degrees = {}
zero_nodes = []
top_sort = []

for u in graph.get_nodes():
pred = graph.get_predecessors(u)
degrees[u] = len(pred)
if len(pred) == 0:
zero_nodes.append(u)

while len(zero_nodes) > 0:
u = zero_nodes.pop()
top_sort.append(u)
for v in graph.get_successors(u):
degrees[v] -= 1
if degrees[v] == 0:
zero_nodes.append(v)

if len(top_sort) < len(graph.get_nodes()):
raise Exception("Graph contains a cycle")

Randomized Testing

To increase the confidence in our implementation we can perform the usual unit tests by comparing the output of the algorithm against known examples. Another way is to use randomized testing. If we can generate random problem instances and have a reliable test to check if the solution returned by our algorithm is correct, we can use this routine to test the algorithm on all sorts of instances.

In our case we can directly use the definition of a topological sort to create such a checking routine. Now for any ordering of the nodes we can create an index that maps the node id to its position in the ordering. Then we can go over all of the edges and check if the position of the start node is smaller than the position of the end node. This results in the following algorithm:

def is_topological_sort(graph, sequence):
"""Check if a given sequence is a topological sort of a directed graph.

graph : (DirectedGraph)
a directed graph
sequence : (iterable[node])
a sequence of nodes in the graph, will be checked if it fulfills the
topological sorting condition
"""
if set(sequence) != set(graph.get_nodes()):
return False
index = {v: i for (i, v) in enumerate(sequence)}
return all(index[u] < index[v] for (u, v) in graph.get_edges())

To generate a random instance of a directed acyclic graph on $n$ nodes, we can choose random pairs from $$\{(i, j) \,|\, i, j \in \mathbb{N} : 0 \leq i < j < n\}$$ and add these as edges to a graph on the nodes $$V = \{0, \ldots, n-1\}$$. Each directed acyclic graph has a topological sort, so it is isomorphic to a graph that can described by the construction above. Thus it is possible to generate any directed acyclic graph with this method.

def random_dag(n_nodes, edge_threshold=0.5):
"""Create a random directed acyclic graph with n_nodes nodes.

n_nodes : (int)
number of nodes in the graph
edge_threshold : (float)
value between 0 and 1, take edge (i, j) if random() > edge_threshold
"""
dag = DirectedGraph()

edges = []
for u in range(n_nodes):
for v in range(u+1, n_nodes):
if random() > edge_threshold: