The Average Number of 1’s in a Binary String of Length <span class="math">\(N\)</span>

Let’s use generating functions to count the average number of 1’s in a binary string of length \(N\).

Definitions

Let \(P\) be the set of all given binary strings.
Let \(p\in P\) be a given binary string.
Let \(|p|\) be the length of a binary string.
Let \(P_{N}=\) the number of binary strings of length N = \(2^{N}\).
Let \(cost(p)\) be the number of 1’s in \(p\).
Let \(P_{Nk}\) be the number of binary strings with \(|p|=N\) and \(cost(p)=k\). We know that \(P_{Nk}=\dbinom{N}{k}\).
\(P(z)=\sum_{N\ge0}P_{N}z^{N}=\sum_{N\ge0}2^{N}z^{N}=\frac{1}{1-2z}\)

The cumulative cost function is:

\begin{equation*} C(z)=\sum_{p\in P}cost(p)z^{|p|} \end{equation*}

Now we use an induction trick: Partition the binary string of length \(N\) into two cases: Strings that begin with 0 and strings that begin with 1.

Let’s restrict ourselves to strings of length \(N\):

\begin{equation*} C(z)=\sum_{|p|=N}cost(p)z^{|p|}=\sum_{|p|_{0}}cost(p)z^{|p|_{0}}+\sum_{|p|_{1}}cost(p)z^{|p|_{1}} \end{equation*}

where \(|p|_{0}\) is the set of all \(p\) of length \(N\) with a leading 0. \(|p|_{1}\) is defined similarly.

The thing about \(|p|_{1}\) is that its cost is \(1+cost(p')\) where \(p'\) is the string of length \(N-1\) after removing the leading bit. With \(|p|_{0}\), its cost is simply \(cost(p')\).

Also, we realize that if we look at all the elements in \(|p|_{0}\) after stripping the leading bit, we get the same elements as \(|p|_{1}\) after doing the same. So we can replace the sum with \(\sum_{|p'|=N-1}\).

We end up with

\begin{equation*} C(z)=\sum_{p}cost(p)z^{|p|}=\sum_{p'}(1 + 2cost(p'))z^{|p'|+1} \end{equation*}

\begin{equation*} C(z)=zP(z)+2zC(z) \end{equation*}

Solving, we get:

\begin{equation*} C(z)=\frac{z}{\left(1-2z\right)^{2}}=\frac{1}{2}z\frac{dP}{dz} \end{equation*}

\begin{equation*} C(z)=\sum_{N\ge0}N2^{N-1}z^{N} \end{equation*}

This gives us the average:

\begin{equation*} C(z)=\frac{N2^{N-1}}{2^{N}}=\frac{N}{2} \end{equation*}

Alternative Approach

We could have written down the cost directly:

\begin{equation*} C_{N}=\sum_{k\ge0}kP_{Nk}=\sum_{k\ge0}k\dbinom{N}{k}=N2^{N-1} \end{equation*}

If we had not proved this sum already, you could simply use the derivation in the current page as a means to get this identity.