Depth First Search

Posted by Beetle B. on Sun 27 July 2014

The depth first search algorithm is a way to traverse all the nodes in a graph. The algorithm is trivial:

Start at a given node. Mark it as visited. Then go visit every adjacent node that has not been visited and recurse. Eventually all nodes will have been visited.

Maze Analogy

One can traverse a maze using a DFS. Let every intersection be a node, and the paths connecting them be edges. Take a very long ball of string, and a chalk, and start walking. Mark every intersection you encounter, and when you can’t go any further, use the string to backtrack.

Applications

  • Finding a path between 2 vertices.
  • Finding the connected component that has a given node in it.
  • The “magic wand” in image manipulation algorithms. Let a pixel be a node, and an edge is between two (physically) adjacent pixels that have very similar colors.
  • We can treat the code in a program as a digraph, where the nodes are blocks of code, and the edges are jumps (through if conditionals, etc). Code analysis tools utilize this to see if code is reachable, etc.
  • Garbage collection. Each object is a node and each edge is a reference to a node. See what nodes are not reachable from the stack and clean them.

Analysis

DFS marks all vertices in run time proportional to the sum of the degrees. This is because each node is visited only once.

Note that implicitly, this algorithm uses a stack. BFS (which we’ll visit later), utilizes a queue.

Finding Connected Components

A DFS yields the connected component. Just carry a peripheral array where the index is the node, and assign each node’s value in the array to a single number while performing DFS. Then when you go to another connected component, just assign all those nodes with a different value. Two nodes are in the same connected component if their corresponding values in the peripheral array are the same.

Note: DFS provides the connected component of a digraph as well (with no modifications).

Determining if a Graph is Bipartite

We can determine if a graph is bipartite by using depth first search. Let the first node be black. Then color each unvisited node the opposite color. So the next node will be white. Then the next black, etc. Most crucially, if you encounter an already visited node, check that its color is different from that of the current node’s. If it isn’t, then the graph is not bipartite.

If you manage to traverse the whole graph, then the graph is bipartite.

The proof that this algorithm works: Assume two connected nodes have the same color. At some point we will have visited each of these nodes. Assume one of them is visited at some point after the other one. When we check its neighbors, we’ll see the first (already visited) node. The algorithm should quit then.

Finding a Cycle

If, during DFS, you encounter an already visited node, and it’s not the one you just came from, then you have a cycle. So you merely need to keep track of the last visited node.

The run time for finding a cycle is \(O(N)\), as you can have at most \(N-1\) edges without forming a cycle (each new edge must not connect to more than one existing node).

Laying Out a Graph in the Plane Without Crossing Edges

There is a DFS based linear algorithm by Tarjan for this, but it’s highly nontrivial to implement.

Topological Sort

A topological sort is a traversal of a digraph with some constraints. Let each node have some value/weight. It is a linear ordering of the nodes if for every edge \(uv\), \(u\) comes before \(v\) in the ordering.

The most common example of this is the tree one often sees in textbooks which indicate the dependencies of the chapters. If you want to know what chapter requires which other chapters as prerequisites, you do a topological sort on them.

A digraph can be topologically sorted if and only if it is a directed acyclic graph (DAG).

To topologically sort a DAG, apply the depth first search algorithm. Whenever you reach a node where you can’t proceed any further, place that node on a stack. Once you’ve traversed all the nodes (not just in a connected component), pop the stack. The order the nodes leave the stack is the desired ordering.

The complexity is linear.

Note that the ordering is not unique.

Implementations

Python

NetworkX has a number of functions.