Polynomial interpolation and extrapolation

This web page explains the ideas behind polynomial interpolation, for problem of "guessing" a value f ( x ) for a function f when only finitely many other values f ( x 0 ) , f ( x 1 ) , ..., f ( x n - 1 ) , are given.

1. Introduction to polynomial interpolation

A commonly seen problem goes as follows: given x i for i = 0 , , n - 1 and y i for i = 0 , , n - 1 where y i = f ( x i ) for an unknown function f guess an expression for f and use it to estimate a value f ( x ) for some other value x .

For example, the data x i , y i might be obtained experimentally, and are presumed to represent sample values in a continuous function f . Of course without other information or assumptions about f the question is impossible to answer. So usually one assumes f is continuous or "smooth" in some sense, or one uses a physical theory that suggests a form for the function f to fit to the data.

This problem occurs frequently in numerics because, to save computer time, it is common to assume the function one is working with can be usefully approximated in a simple way. We can hope to use such approximations to make the computation quicker or more accurate.

It is important to remember that without some sort of "smoothness" assumptions on f no progress can be made; these assumptions may be completely wrong, and lead to misleading answers.

By far the most common assumption made on a function f are that it can be reasonably be approximated by a polynomial. Of course in many cases this assumption is certainly wrong. The function e x cannot be approximated by a polynomial: it grows too fast to do that. However one might try to approximate a part of the function by a polynomial. Thus e x might be usefully approximated by a polynomial for x in the range from 10.0 to 10.1 , and by a different polynomial in a different range.

2. Lagrange's formula for polynomial interpolation

As a general rule a polynomial of degree at most n - 1 has the form p ( X ) = p 0 + p 1 X + + p n - 1 X n - 1 so has n "unknown" coefficients, p 0 , , p n - 1 . To solve for these coefficients we normally need n equations, so we expect or hope to find that there is a polynomial of degree at most n - 1 that can be specified exactly, that passes though n points ( x 0 , y 0 ) to ( x n - 1 , y n - 1 ) . In fact this is true, but will only work if the x i values are all different.

Theorem.

Given n points ( x 0 , y 0 ) to ( x n - 1 , y n - 1 ) , with x i x j for all i j , there is a unique polynomial p ( X ) of degree at most n - 1 such that p ( x i ) = y i for all i .

Proof.

Uniqueness follows from a theorem on polynomials: if p ( X ) has degree at most n - 1 and is not the constant function with value zero everywhere, then the polynomial equation p ( X ) = 0 has at most n - 1 roots. Using this, suppose p ( X ) , q ( X ) are both polynomials of degree at most n - 1 and p ( x i ) = q ( x i ) = y i for all i , then p ( X ) - q ( X ) has n roots so is the zero polynomial, i.e. p ( X ) = q ( X ) for all X . Existence follows by the Lagrange formula given below.

A well known formula by Lagrange gives such a polynomial. It is

p ( x ) = ( x - x 1 ) ( x - x 2 ) ( x - x n - 1 ) ( x 0 - x 1 ) ( x 0 - x 2 ) ( x 0 - x n - 1 ) y 0 + ( x - x 0 ) ( x - x 2 ) ( x - x n - 1 ) ( x 1 - x 0 ) ( x 1 - x 2 ) ( x 1 - x n - 1 ) y 1 + + ( x - x 0 ) ( x - x 1 ) ( x - x n - 2 ) ( x n - x 0 ) ( x n - x 1 ) ( x n - x n - 2 ) y n

You should be able to see how it works. Each rational polynomial in the variables x and x 0 to x n - 1 that multiplies y i is equal to 1 at x = x i and 0 for each x = x j with i j . We then just have to multiply by the relevant y i to get something equal to y i at only x i . Adding all these together we get some polynomial of degree at most n + 1 that has the required properties. By the uniqueness theorem just given this is the only possible polynomial that fits and has the required degree.

The formula above is not usually used to get the coefficients of the Lagrange interpolating polynomial, but it is in a form to allow p ( x ) to be calculated directly from the data ( x i , y i ) and x .

Exercise.

Write a C++ function with the name and type double lagrange(double x, vector<double>& xvals, vector<double>& yvals) that returns the value p ( x ) , where x is the degree n interpolating polynomial, where vectors xvals,yvals are both of size n + 1 and x[i] contains x i , y[i] contains y i , for each i < n .

Possible Solution.

// simple function for Lagrange interpolation
#include <iostream>
#include <vector>

double lagrange(double x, vector<double>& xvals, vector<double>& yvals) {
  if (xvals.size()!=yvals.size()) {
    std::cerr << "Error: function lagrange called when xvals and yvals have different sizes!" << std::endl;
    return 0;
  }
  int n = xvals.size();
  double sum = 0.0;
  for (int i=0; i<n; i++) {
    // now calculate the ith term of the Lagrange formula
    double term = yvals[i];
    for (int j=0; j<n; j++) {
      if (j==i) continue; // skip this particular case
      term = term * (x - xvals[j]) / ( xvals[i] - xvals[j] );
    }
    sum = sum + term;
  }
  return sum;
}

Things to think about: How can you avoid repeating the same calculation over and over again? How can you keep the number of multiplications and divisions to a minimum? What sources of round off errors are there? What is the run time of your function in terms of n , the degree of the polynomial?

None of these issues are easy. A good interpolation program is rather difficult to design.

A related question is, given input data double x, vector<double>& xvals, vector<double>& yvals as before, to find the pair ( x i , y i ) that is closest to the x value give, but with x i x . There are two cases: first if the data vector<double>& xvals is not sorted, in which case the best technique may be to try every possible x i term in turn. This will be O ( n ) . If the data vector<double>& xvals is sorted there is a better method.

Exercise.

Write a C++ function with the name and type int nearestindex(double x, vector<double>& xvals, vector<double>& yvals) that returns the index i of the nearest ( x i , y i ) pair with x i < x . (Return -1 if there is no such i .) You may assume that the vector xvals is sorted in increasing order. What is the run time of your function in terms of n , the length of the vectors?

3. Computing the interpolating polynomial

The Lagrange polynomial is OK, and often very useful. But there are better ways to compute it. Neville's algorithm is one possibility. Here we compute P i , i + 1 , , j ( x ) (the polynomial interpolation that passes through points ( x i , y i ) up to and including ( x j , y j ) ) in terms of P i , i + 1 , , j - 1 ( x ) and P i + 1 , i + 1 , , j ( x ) . This is a recursion where the base case is easy. It is just

P i ( x ) = y i

The recursion step is

P i , i + 1 , , j ( x ) = ( x - x j ) P i , i + 1 , , j - 1 ( x ) + ( x i - x ) P i + 1 , , j ( x ) x i - x j

This formula can be verified rapidly, recalling that P i , i + 1 , , j - 1 and P i + 1 , , j agree at x i + 1 to x j - 1 . Hence P i , i + 1 , , j ( x k ) = y k for all k = i + 1 , , j - 1 , and also (by a separate argument) for k = i , j . The recurrence above says that P i , i + 1 , , j ( x ) has the correct values at x i and x j and all the other x k so by uniqueness is the correct polynomial.

Here is an implementation of Neville's algorithm.

// neville.cpp (click here to download)

// improved Lagrange interpolation: Neville's algorithm
#include <iostream>
#include <vector>

/* internal form for Neville's algorithm.  The same as 
 * neville below except only points (xval[i], yval[i]) with 
 * start <= i <= end are considered. */
double neville_internal(double x, int start, int end, vector<double>& xvals, vector<double>& yvals) {
  if (start==end) return yvals[start];
  double divisor = x[end] - x[start];
  double left = ( x - x[end] ) * neville_internal(x,start,end-1,xvals,yvals);
  double right = ( x[start] - x ) * neville_internal(x,start+1,end,xvals,yvals);
  return (left + right)/divisor;
}

/* 
 * returns the value at x of the polynomial passing through each
 * (xval[i], yval[i]), computed using Neville's algorithm.
 */
double neville(double x, vector<double>& xvals, vector<double>& yvals) {
  int n = xvals.size();
  if (n!=yvals.size()) {
    std::cerr << "Error: function neville called when xvals and yvals have different sizes!" << std::endl;
    return 0;
  }
  return neville_internal(x,0,n-1,xvals,yvals);
}

This algorithm looks extremely elegant. But it has a problem: the recursion hides the fact that various terms are calculated several times, and the recursion form of Neville's algorithm is in fact very slow for a large number or data points. The solution is, as is often the case, to use an axillary array and "unfold" the recursion into a loop.

// nevillecount.cpp (click here to download)

// nevillecount.cpp
//
// An implementation of Neville's algorithm for Lagrange interpolation, counting number of operations

#include <iostream>
#include <vector>

/* 
 * A form of Neville's algorithm.  This code eliminates the
 * recursion in the original specification and is much faster.
 */
double neville(double x, vector<double>& xvals, vector<double>& yvals) {
  if (xvals.size()!=yvals.size()) {
    std::cerr << "Error: function neville called when xvals and yvals have different sizes!" << std::endl;
    return 0;
  }
  int s = xvals.size();
  vector<double> pvals(s);
  int count = 0;

  // at stage k=0,1,...,s-1, the vector entry pvals[j] 
  // contains the Neville value P_{j,j+1,...,j+k}(x)
  // This is for all 0<=j<s-k

  // base case:
  int k = 0;
  for (int i=0; i<s; i++) pvals[i] = yvals[i];

  // induction step:
  while (k<s-1) {
    k++;
    for (int i=0; i<s-k; i++) { 
      int j = i+k;
      double divisor = xvals[i] - xvals[j];
      double left = ( x - xvals[j] ) * pvals[i];
      double right = ( xvals[i] - x ) * pvals[i+1];
      pvals[i] = (left + right)/divisor;
      count += 3;
    }
  }
  std::cout  << count << " * and / operations performed" << endl;
  return pvals[0];
}

It turns out that the new algorithm now needs fewer multiplications and divisions, about 3/4 as many as was required for the original straight programming of the Lagrange formula. This is at the cost of introducing a new vector for the intermediate values.

4. Concluding remarks