Floating Point Exceptions

Posted by Beetle B. on Tue 22 January 2019

In IEEE-754, the implementer can signal an exception along with the result of the operation. Usually (or perhaps mandated?), the signal (or flag) remains once set. A subsequent operation does not clear it. The idea is that we can then perform multiple operations and check the flag at the end.

Some exceptions:

Invalid: This is when the input is invalid. The result is NaN.

DivideByZero: This is when you get an infinite result from finite inputs. Note that this will include \(\log(0)\).

Overflow: This occurs when the rounded result with an unrestricted exponent range would have an exponent greater than \(e_{\max}\).

Underflow: This was discussed here.

Inexact: This is signaled when the exact result cannot be represented as a floating point number.

Now if our floating point arithmetic follows the convention of IEEE-754 where \(e_{\min}=1-e_{\max}\), then:

  • If \(x\) is normal, \(1/x\) never overflows.
  • If \(x\) is a finite floating point number, then \(1/x\) can underflow. Typically, the rounded result is not 0 if we have subnormal numbers.

Generally, it is faster to handle exceptions than it is to prevent them. Say you have an algorithm that for the majority of cases would neither overflow or underflow. You can write your program to use that algorithm, and if an exception is detected, to switch to a different (slower) algorithm that is guaranteed not to over/underflow.