Languages and Compilers
Here are some concerns: Say you want to compute and they are all of 32 bit precision, but the machine supports 64 bit....
Posted by
Beetle B.
on
Mon 16 September 2019
Compensated Polynomial Evaluation
The book provides an algorithm. I didn’t bother writing the details.
Posted by
Beetle B.
on
Fri 07 June 2019
Compensated Dot Products
When the condition number is not high, one can do the naive algorithm for the dot product. Otherwise, one should do the...
Posted by
Beetle B.
on
Fri 07 June 2019
Computing Sums More Accurately
Reordering the Operands, and a Bit More General ideas: Sort all your summands in ascending order (magnitude). Even more complex, sort...
Posted by
Beetle B.
on
Thu 23 May 2019
Computing Validated Running Error Bounds
The problem with the previous error bounds is that they are in terms of quantities like , which are not known in advance,...
Posted by
Beetle B.
on
Thu 16 May 2019
Properties For Deriving Validated Running Error Bounds
Theorem for FMA Let be nonnegative floating point numbers. Assuming underflow does not occur, then \(xy+z\le...
Posted by
Beetle B.
on
Wed 15 May 2019
Some Refined Error Estimates
Let the rounding mode be RN. Assume no overflow occurs. Then if you do recursive summation, the following inequality related to the...
Posted by
Beetle B.
on
Fri 10 May 2019
Notation For Error Analysis and Classical Error Estimates
Unless specified otherwise, everything in this chapter assumes no underflow. We often will have many factors of ,...
Posted by
Beetle B.
on
Fri 10 May 2019
Evaluation of the Error of an FMA
The error of an FMA calculation is not always a floating point number. However, we can use two floating point numbers to exactly...
Posted by
Beetle B.
on
Fri 19 April 2019
Multiplication by an Arbitrary Precision Constant with an FMA
Suppose you need to multiply by a constant that is not exactly representable. Think and the like. We’d like to multiply and...
Posted by
Beetle B.
on
Fri 19 April 2019
Conversions Between Integers and Floating Point Numbers
This is a short section with some details and magic numbers. I did not bother.
Posted by
Beetle B.
on
Fri 19 April 2019
Radix Conversion Algorithms
The algorithms are in the book. I did not reproduce them here. I did not read the rest of the section. There are a lot more details there.
Posted by
Beetle B.
on
Fri 19 April 2019
Conditions on the Formats
This section deals with changing bases. The most obvious application is to go back and forth between decimal and binary (to make it easy...
Posted by
Beetle B.
on
Fri 19 April 2019
Newton-Raphson Based Square Root With FMA
The Basic Methods One way is to use Newton’s iteration on . This method for calculating square root goes back thousands...
Posted by
Beetle B.
on
Fri 19 April 2019
Possible Double Rounding in Division Algorithms
This section deals with floating point values, not necessarily between 1 and 2. Assume they are non-negative, though. For this,...
Posted by
Beetle B.
on
Thu 18 April 2019
Using The Newton Iteration For Correctly Rounded Division With FMA
We need to calculate where are binary floating point numbers, and is RN, RD, RU or RZ. We have a useful proof:...
Posted by
Beetle B.
on
Wed 17 April 2019
Variants of the Newton Raphson Iteration
Assume for this section. Some of it may not work for decimal. We want to approximate . Assume . In...
Posted by
Beetle B.
on
Tue 26 March 2019
Another Splitting Technique: Splitting Around a Power of 2
In this section, assume . Now given a floating point , we want to form two floating point numbers and...
Posted by
Beetle B.
on
Tue 26 March 2019
Computation of Residuals of Division and Square Root With an FMA
For this article, define a representable pair for a floating point number to be any pair such that...
Posted by
Beetle B.
on
Mon 25 March 2019
Accurate Computation of the Product of Two Numbers
The 2MultFMA Algorithm This has been covered elsewhere. It works well when you use FMA. If No FMA Is Available If there is no FMA...
Posted by
Beetle B.
on
Thu 21 March 2019
Accurate Computation of the Sum of Two Numbers
Let be two floating point numbers. Let be . Regardless of which number it picks in a tie, it can be shown that...
Posted by
Beetle B.
on
Thu 21 March 2019
Exact Multiplications and Divisions
When you multiply a floating point number by a power of , the result is exact provided there is no over or underflow. Another...
Posted by
Beetle B.
on
Wed 20 March 2019
Exact Addition
Sterbenz’s Lemma: If your floating point system has denormals, and if are non-negative, finite floating point numbers such that...
Posted by
Beetle B.
on
Wed 20 March 2019
Computing The Precision
To get of the floating point system you are on: i = 0 A = 1.0 B = 2 # The radix. while (A + 1.0) - A == 1.0: A = B * A i += 1...
Posted by
Beetle B.
on
Wed 20 March 2019
Computing The Radix
Suppose we want to compute the radix of a floating point system. The code below will do it for you - it works assuming the...
Posted by
Beetle B.
on
Wed 20 March 2019
IEEE Support in Programming Languages
Do not assume that the operations in a programming language will map to the ones in the standard. The standard was originally written...
Posted by
Beetle B.
on
Mon 04 March 2019
Rest of chapter
I skipped the rest of the chapter (inlcuding hardware details).
Posted by
Beetle B.
on
Thu 07 February 2019
Special Values
NaN Signaling NaNs do not appear as the result of arithmetic operations. When they appear as an operand, they signal an...
Posted by
Beetle B.
on
Thu 07 February 2019
Default Exception Handling
Invalid The default result of such an operation is a quiet NaN. The operations that lead to Invalid are: Most operations on a...
Posted by
Beetle B.
on
Wed 06 February 2019
Conversions To/From String Representations
This section addresses how one can convert a character sequence into a decimal/binary floating point number. Decimal Character Sequence...
Posted by
Beetle B.
on
Wed 06 February 2019
Comparisons
The standard requires that you can compare any two floating point numbers, as long as they share the same radix. The unordered condition...
Posted by
Beetle B.
on
Wed 06 February 2019
Attributes and Rounding
Rounding Direction Attributes IEEE 754-2008 requires that the following be correctly rounded: Arithmetic operations: Addition...
Posted by
Beetle B.
on
Wed 30 January 2019
Operations Specified By The Standards
Arithmetic Operations and Square Root Handling Signed 0 If are nonzero, and or exactly, then it is ...
Posted by
Beetle B.
on
Wed 30 January 2019
Formats
The standard defines several interchange formats to allow for transferring floating point data between machines. They could be as bit...
Posted by
Beetle B.
on
Mon 28 January 2019
Note on the Choice of Radix
It has been shown that gives better worst case and average accuracy than all other bases. if...
Posted by
Beetle B.
on
Tue 22 January 2019
Lost and Preserved Properties of Arithmetic
Floating point addition and multiplication are still commutative. Associativity is compromised, though. An example: Let...
Posted by
Beetle B.
on
Tue 22 January 2019
Floating Point Exceptions
In IEEE-754, the implementer can signal an exception along with the result of the operation. Usually (or perhaps mandated?), the signal...
Posted by
Beetle B.
on
Tue 22 January 2019
Fused Multiply Add
Let be the rounding function, and are floating point numbers. Then is . if...
Posted by
Beetle B.
on
Tue 22 January 2019
ULP Errors vs Relative Errors
Converting From ULP Errors to Relative Errors Let be in the normal range, and . Then: \begin{equation*}...
Posted by
Beetle B.
on
Tue 22 January 2019
The ULP Function
There are multiple definitions of unit in the last place. I think most agree when is not near a boundary point. Here is the...
Posted by
Beetle B.
on
Wed 16 January 2019
Relative Error Due To Rounding
Ranges The normal range is the set of real numbers: and the subnormal range are where...
Posted by
Beetle B.
on
Tue 15 January 2019
Rounding Functions
The IEEE 754-2008 specifies five rounding functions: Round toward (RD): It is the largest floating point number less than or...
Posted by
Beetle B.
on
Mon 14 January 2019
The Other “Numbers”
0 (some systems have signed 0’s as well) NaN for any invalid operation (some systems are signed, some are not). In the IEEE...
Posted by
Beetle B.
on
Fri 11 January 2019
Underflow
Underflow before rounding occurs when the absolute value of the exact value is strictly less than (i.e. the...
Posted by
Beetle B.
on
Fri 11 January 2019
Normalizing
We would like a unique way to represent . One approach is to pick the one which gives the smallest exponent possible (while still...
Posted by
Beetle B.
on
Fri 11 January 2019
Definitions
A radix floating point number is of the form , where is called the significand and is...
Posted by
Beetle B.
on
Fri 11 January 2019