Sequential Evaluation of Polynomials If you don’t have any parallelism available, Horner’s scheme is a good option. And if you have the FMA instruction available, use it!
If your polynomial is of a high degree, Knuth showed a way (it is in the book) to reduce the number of operations needed than the naive Horner implementation. Evaluating Polynomials When Some Parallelism is Available
Generalization of Horner’s Scheme
This method involves splitting up the polynomial into its odd and even powers, and then add them up.
Specifically, let \(y=x^{2}\) and get \(p_{e}\) and \(p_{o}\). When you evaluate both, then \(p=p_{e}+xp_{o}\). You can generalize this to \(y=x^{k}\).
Estrin’s Method
You can use Estrin’s method. This method takes more operations than Horner’s method, but it is faster due to parallelism.
Evaluating Polynomials on Modern Processors
In general, deciding what level of partitioning for evaluation of polynomials in parallel is nontrivial. There’s no general principle that will tell you which partitioning to use for a given order polynomial.
Evaluating in parallel may also impact the accuracy of your result. Computing Bounds on the Evaluation Error
Much of this is covered in the Handbook book. I will not repeat it here. Note that the error bounds there assume FMA is not used (i.e. separate error for addition and multiplication).
He gives a Maple program for computing the error bounds, as well as how to use Gappa to get such a bound. Note that this error bound is not the ones given by analytical formulae. It is an algorithm that usually gives a much tighter bound. Use it only if you need a tighter bound.
Evaluation Error With Methods Different From Horner’s Scheme
In general, just use Gappa to estimate the error bound. Don’t try to write your own Maple program unless you will use the same strategy for many functions.
When High Accuracy Is Needed
We often need to compute at higher precision than our target precision (e.g. intermediate calculation).
Also, often \(a_{0}\), or \(a_{0}+a_{1}x\) will be much larger than the rest of the polynomial (perhaps due to range reduction). In these cases, evaluate the two separately, and store those values separately. Polynomial Evaluation By Specific Hardware I skipped this section.