The Basic Methods One way is to use Newton’s iteration on \(f(x)=x^{2}-a\). This method for calculating square root goes back thousands of years. It converges as long as \(x_{0}>0\).
It has two problems:
- Guaranteeing correct roundedness is nontrivial
- It requires a division at each iteration.
Another method is to use \(f(x)=1/x^{2}-a\) to calculate \(1/\sqrt{a}\), and then multiply by \(a\). But again, correct roundedness is nontrivial. This doesn’t involve division. (Well, it does have a divide by 2, but that is trivial).
The full iteration expression is \(x_{n+1}=x_{n}(3-ax_{n}^{2})/2\)
The convergence region is \((0,\sqrt{3/a})\).
Usually we get quadratic convergence if we start near \(1/\sqrt{a}\), and we do this by using a table or a polynomial of low degree. Using The Newton Method for Correctly Rounded Square Roots The iteration above can be written as:
Define the following: \(\epsilon_n=1-ax_{n}^{2}\). Then the iteration becomes \(x_{n+1}=x_{n}+\frac{1}{2}\epsilon_{n}x_{n}\).
But we want to utilize the FMA. So define the following:
- \(r_{n}=\frac{1}{2}\epsilon_{n}\)
- \(g_{n}=ax_{n}\)
- \(h_{n}=\frac{1}{2}x_{n}\)
Now note that:
- \(r_{n}=\frac{1}{2}-g_{n}h_{n}\)
- \(g_{n+1}=g_{n}+g_{n}r_{n}\)
- \(h_{n+1}=h_{n}+h_{n}r_{n}\)
This is the original iteration above, with \(h_{n}\rightarrow1/(2\sqrt{a})\). But also note that \(g_{n}\) converges to \(\sqrt{a}\). These are all FMA instructions. Note that the last two bullets can be parallelized.
Theorem: For any \(\beta\), the square root of a floating point number cannot be the exact midpoint between two consecutive floating point numbers.
Proof: Let \(r\) be exactly between two floating point numbers, such that it is the square root of a floating point number. Now note that \(r\) is in the normal range. Why? If \(x>1\), then \(r>1\). And if \(x\) is a denormal, then \(x<\beta^{e_{\min}-p+1}\). Then \(\sqrt{x}<\beta^{(e_{\min}-p+1)/2}\). Note that \(e_{\min}\) is a negative number. Dividing it by 2 will put is clearly in the normal range.
Now in general, \(r=R\beta^{e-p+1}+\frac{1}{2}\beta^{e-p+1}\). Multiplying by 2 and then squaring, we get \(4r^{2}\beta^{2(p-1-e)}=(2R+1)^{2}\).
Now \(r^{2}=x=M_{x}\beta^{e_{x}-p+1}\). We can show that \(|e_{x}-2e|\le1\) (i.e. \(e\) is approximately half of \(e_{x}\), with perhaps an offset of 1). So \(r^{2}\beta^{2(p-1-e)}=M_{x}\beta^{e_{x}-p+1}\beta^{2(p-1-e)}\). This is equal to \(r^{2}\beta^{2(p-1-e)}=M_{x}\beta^{e_{x}+p-1-2e}\beta^{2(p-1-e)}\).
Utilizing \(|e_{x}-2e|\le1\), we can see that the RHS is an integer (assuming a sufficiently large \(p\)). Thus, the LHS is an integer that is even (multiple of 4). The RHS is clearly an odd number. We have a contradiction.
Note, there was no requirement for \(\beta\) to be prime.
Theorem: For binary systems, the square root reciprocal of a floating point number cannot be the exact midpoint between two floating point numbers.
The proof isn’t provided, but the claim is that it is similar to the previous theorem. Do note that we need binary for this.
So let’s say we compute what we think is the square root of \(a\). How do we know it is correctly rounded? Use the Tuckerman test.
Theorem: If \(a,r\) are floating point numbers, then \(r\) is \(\sqrt{a}\) rounded to nearest if and only if:
\begin{equation*} r(r-\ulp(r^{-}))<a\le r(r+\ulp(r)) \end{equation*}
A useful mnemonic is that both sides of the inequality is almost \(r^{2}\).
Note that each end of the inequality can be calculated with an FMA.
Proof: First of all, we need not worry about tie breaking rules, because the square root cannot be exactly halfway.
Now I want to reduce to \(1\le r<\beta\) without losing generality. To show I can do this, note that if \(k\) is an integer, then I can replace \(r\) with \(R=\beta^{k}r\) and \(a\) with \(A=\beta^{2k}a\). Specifically, I can multiply the whole inequality by \(\beta^{2k}\). As long as there is no overflow, we get:
Now \(R=RN(\sqrt{A})\) translates easily to \(r=RN(\sqrt{a})\) (I suspect this has to do with \(\beta\) being the base).
So we can now consider two cases: \(r=1\), and \(r>1\).
If \(r=1\), then the inequality becomes:
This allows for only two values of \(a\): \(1\) and \(1+\beta^{-p+1}\). So if we assume this, we need to show that \(r=1\) is the round to nearest of these two values of \(a\). For \(a=1\), this is obvious. For \(a=1+\beta^{-p+1}\)…
Well, let \(1=\RN(\sqrt{a})\). Then:
Square the above, and you’ll get the only possible values of \(a\) being the same. So if you assume it is correctly rounded, you can see that the original inequality holds. If you assume the original inequality holds, then you have \(r=1\), and \(a\) being one of the two values. But we showed that those two values have a square root of \(1\) correctly rounded.
Now if \(1<r<\beta\), we follow the same steps. But we do have to note that \(r\) is a multiple of \(\beta^{1-p}\). Take it from there to limit the possible set of floating point numbers.