Operations Specified By The Standards

Posted by Beetle B. on Wed 30 January 2019

Arithmetic Operations and Square Root

Handling Signed 0

If \(x,y\) are nonzero, and \(x+y=0\) or \(x-y=0\) exactly, then it is \(+0\) for RN, RZ, RU modes, and \(-0\) for RD.

However, if we have \(x+x\) or \(x-(-x)\), and \(x=\pm0\), then the result has the same sign as \(x\).

So be careful. If you have \(x+0\), the result is not necessarily \(x\). If \(x=-0\), then the result is \(+0\) in RN.

Square Root

If the input is greater than or equal to 0, the result always has a positive sign, with the exception of \(\sqrt{-0}=-0\) Remainders Let \(x\) be a finite floating point number. Let \(y\) be a nonzero, finite floating point number. Then the remainder \(r=x\mathrm{REM}y\) is defined as:

  1. \(r=x-yn\) where \(n\) is the integer closest to \(x/y\).
  2. If \(x/y\) half way between two integers, let \(n\) be the even integer.
  3. If \(r=0\), its sign is that of \(x\).

Note: Remainders are always exactly representable under these conditions.

Also note that nowhere does it say \(x,y\) need to be integers!

Finally, \(x\mathrm{REM}\infty=x\). Preferred Exponent How do we decide what the exponent should be for the result of an operation? This is not obvious for decimal formats. The rules are:

  1. If the result is inexact, use the smallest exponent.
  2. If the result is exact, then if the cohort includes a preferred exponent (defined below), use that one. Otherwise pick the one with the closest exponent to the preferred one.

Below are the preferred exponents:

  • \(x\pm y\): The smaller of the two exponents
  • \(xy\): Add the two exponents
  • \(x/y\): Subtract the two exponents
  • \(\FMA(x,y,z)\): \(\min(Q(x)+Q(y),Q(z))\)
  • \(\sqrt{x}\): \(\lfloor Q(x)/2\rfloor\) scaleB and logB

\(\mathrm{scaleB}(x,n)\) is defined as \(x\beta^{n}\), correctly rounded, where \(x\) is any floating point number and \(n\) is an integer.

When \(x\) is finite and nonzero, \(\mathrm{logB}(x)\) is \(\lfloor\log_{\beta}|x|\rfloor\). We also have \(\mathrm{logB}(\NaN)=\NaN\), \(\mathrm{logB}(\pm0)=-\infty\), \(\mathrm{logB}(\pm\infty)=\infty\)