We wish to find the distance between two nodes \(a\) and \(b\). Let’s generalize this further and find the distance from \(a\) to all nodes. Our particular problem is a simple special case.
Algorithm 1: BFS
The basic idea: Start from node \(a\), and for all its neighbors, note that their distance is 1. Then for each neighbor, go through its neighbors, and if we have not seen this node before, note that its distance from \(a\) must be 2. Keep recursing until there are no more nodes left.
Effectively, we will have traversed the whole connected component of the graph that contains \(a\).
How do we know if we’ve seen a node before? We need to set its distance to some sentinel value which we’ll refer to as \(\infty\). When we visit a node, we’ll set its value appropriately. This way nodes that are not within the connected component will show up (correctly) as \(\infty\).
How do we keep track of the neighbors and ensure we traverse all of them? By using a queue. As we note down a neighbor to a node, we enqueue the neighbor. As we leave a node, we dequeue it. When the queue is empty, we’ve traversed the connected component.
This method of traversal is known as breadth first traversal.
(Strictly speaking, there’s no recursion, per se - it’s just plain iteration).
BFS works for digraphs as well.
Pseudocode
The pseudocode for the algorithm is given below:
Initialize queue Q Initialize array d[N] and set each value to infinity Q.enqueue(a) distance = 0 d[a] = 0 while Q is not empty: n = Q.dequeue distance = distance + 1 for each neighbor n_i of n: if d[n_i] is infinity: d[n_i] = distance Q.enqueue(n_i)
Complexity
This algorithm takes \(O(N+E_{cc})\) where \(E_{cc}\) is the number of edges in the connected component of \(a\). The worst case occurs when the graph is complete, which would result in \(O(N^{2})\).
Implementations
Python
Use the nx.shortest_path_length function in NetworkX
Algorithm 2: Adjacency Matrices
Let \(A\) be the adjacency matrix for the graph. If \(\left(A\right)^{k}_{ab}>0\) for some \(k\), then a path of length \(k\) exists.
To see why, consider \(k=2\):
This is nonzero if and only if the node at \(k\) is connected to both \(a\) and \(b\).
For \(k=3\):
Note that this is not an efficient way of calculating whether a node exists! But it does provide some theoretical prowess.