Let’s use generating functions to count the average number of 1’s in a binary string of length \(N\).
Definitions
- Let \(P\) be the set of all given binary strings.
- Let \(p\in P\) be a given binary string.
- Let \(|p|\) be the length of a binary string.
- Let \(P_{N}=\) the number of binary strings of length N = \(2^{N}\).
- Let \(cost(p)\) be the number of 1’s in \(p\).
- Let \(P_{Nk}\) be the number of binary strings with \(|p|=N\) and \(cost(p)=k\). We know that \(P_{Nk}=\dbinom{N}{k}\).
- \(P(z)=\sum_{N\ge0}P_{N}z^{N}=\sum_{N\ge0}2^{N}z^{N}=\frac{1}{1-2z}\)
The cumulative cost function is:
Now we use an induction trick: Partition the binary string of length \(N\) into two cases: Strings that begin with 0 and strings that begin with 1.
Let’s restrict ourselves to strings of length \(N\):
where \(|p|_{0}\) is the set of all \(p\) of length \(N\) with a leading 0. \(|p|_{1}\) is defined similarly.
The thing about \(|p|_{1}\) is that its cost is \(1+cost(p')\) where \(p'\) is the string of length \(N-1\) after removing the leading bit. With \(|p|_{0}\), its cost is simply \(cost(p')\).
Also, we realize that if we look at all the elements in \(|p|_{0}\) after stripping the leading bit, we get the same elements as \(|p|_{1}\) after doing the same. So we can replace the sum with \(\sum_{|p'|=N-1}\).
We end up with
Solving, we get:
This gives us the average:
Alternative Approach
We could have written down the cost directly:
If we had not proved this sum already, you could simply use the derivation in the current page as a means to get this identity.