FMA">

Computation of Residuals of Division and Square Root With an FMA

Posted by Beetle B. on Mon 25 March 2019

For this article, define a representable pair for a floating point number \(x\) to be any pair \((M,e)\) such that \(x=M\beta^{e-p+1},|M|\le\beta^{p-1}\) and \(e_{\min}\le e\). Note that \(M\) has a different meaning here.

Now let \(x,y\) be floating point numbers. Let \(q=o(x/y)\), where \(o\) is round to nearest or a direct rounded function. If \(q\) is not \(\infty\) or NaN, then \(x-qy\) is a floating point number if and only if there exist two representable pairs \((M_{y},e_{y}),(M_{q},e_{q})\) that represent \(y\) and \(q\) such that:

  • \(e_{y}+e_{q}\ge e_{\min}+p-1\)
  • \(q\ne\alpha\) or \(\alpha/2\le|x/y|\) where \(\alpha=\beta^{e_{\min}}-p+1\) is the smallest positive number.

Let \(x\) be a floating point number. Let \(\sigma\) be \(\sqrt{x}\) rounded to the nearest floating point value. If \(\sigma\) is neither an infinity or NaN, then \(x-\sigma^{2}\) is representable if and only if there exists a representable pair \((M_{\sigma},e_{\sigma})\) that represents \(\sigma\) such that \(2e_{\sigma}\ge e_{min}+p-1\).

Proofs are not provided.