Exact Addition

Sterbenz’s Lemma:

If your floating point system has denormals, and if $x,y$ are non-negative, finite floating point numbers such that $y/2\le x\le2y$ , then $x-y$ is a floating point number.

Proof:

We can assume $y\le x\le2y$ . Let $M_{x},e_{x}$ be the significand and exponent of $x$ , and define similar variables for $y$ . Now we can easily show that $e_{y}\le e_{x}$ . Now let $\delta=e_{x}-e_{y}$ . Then:

$\begin{equation*} x-y=\left(M_{x}\beta^{\delta}-M_{y}\right)\beta^{e_{y}-p+1}=M\beta^{e_{y}-p+1} \end{equation*}$

Now we know that $M\ge0$ because $x\ge y$ .

We know that $M$ is an integer.

From $x\le2y$ we know that $x-y\le y$ . Thus $M\beta^{e_{y}-p+1}\le M_{y}\beta^{e_{y}-p+1}$ . This gives $M\le M_{y}\le\beta^{p}-1$ . The proof is complete.

We need denormals because $x-y$ could be a denormal.

Another theorem: If $x$ and $y$ are floating point numbers, and if $\RN(x+y)$ is subnormal, then $\RN(x+y)=x+y$ .

Proof: All floating point numbers are integral multiples of $\alpha$ . Thus so is $x+y$ . If it is a subnormal number, then it is less than $\beta^{e_{\min}}$ . Thus it is exactly representable. All multiples of $\alpha$ that are subnormal are exactly representable!