Processing math: 92%

Exact Addition

Posted by Beetle B. on Wed 20 March 2019

Sterbenz’s Lemma:

If your floating point system has denormals, and if x,y are non-negative, finite floating point numbers such that y/2x2y, then xy is a floating point number.

Proof:

We can assume yx2y. Let Mx,ex be the significand and exponent of x, and define similar variables for y. Now we can easily show that eyex. Now let δ=exey. Then:

xy=(MxβδMy)βeyp+1=Mβeyp+1

Now we know that M0 because xy.

We know that M is an integer.

From x2y we know that xyy. Thus Mβeyp+1Myβeyp+1. This gives MMyβp1. The proof is complete.

We need denormals because xy could be a denormal.

Another theorem: If x and y are floating point numbers, and if \RN(x+y) is subnormal, then \RN(x+y)=x+y.

Proof: All floating point numbers are integral multiples of α. Thus so is x+y. If it is a subnormal number, then it is less than βemin. Thus it is exactly representable. All multiples of \alpha that are subnormal are exactly representable!