Let’s use generating functions to count the average number of leaves in a binary tree with \(N\) internal nodes.
Definitions
- Let \(P\) be the set of all binary trees.
- Let \(p\in P\) be a given binary tree
- Let \(|p|\) be the number of internal nodes in a binary tree.
- Let \(P_{N}=\) the number of binary trees with \(N\) internal nodes. We showed this earlier to be \(\frac{1}{N+1}\dbinom{2N}{N}\).
- \(P(z)=\frac{1}{2}\left(1-\sqrt{1-4z}\right)\)
- Let \(cost(p)\) be the number of leaves in \(p\).
The cumulative cost function is:
\begin{equation*}
C(z)=\sum_{p\in P}cost(p)z^{|p|}
\end{equation*}
\begin{equation*}
C(z)=z+\sum_{p_{L}}\sum_{p_{R}}\left[cost(p_{L})+cost(p_{R})\right]z^{|p_{L}|+|p_{R}|+1}
\end{equation*}
This is the cost of:
- A tree with a single node and 2 leaves (\(z\)).
- A tree with the root (which explains the 1 in the exponent), and the left and right subtrees.
If you expand the terms out, you’ll get:
\begin{equation*}
C(z) = z\left[1+2C(z)P(z)\right]
\end{equation*}
\begin{equation*}
C(z)=\frac{z}{\sqrt{1-4z}}
\end{equation*}
Expanding it out using the Binomial expansion and simplifying, we get:
\begin{equation*}
C_{N}=\dbinom{2N-2}{N-1}
\end{equation*}
And so the average number of leaves is:
\begin{equation*}
\frac{N^{2}(N+1)}{2N(2N-1)}
\end{equation*}
As you take the limit in large \(N\), you get \(N/4\).