Quadrature and numerical integration

This web page explains some ideas for quadrature, also called numerical integration. That is, to estimate the value of an integral $\int_{a}^{b} f (x) d x$ when values $a, b$ and the function $f$ are known by algorithms.

This is a huge topic with many many different methods. As a general idea, sample values of the function $f (x)$ are calculated at various points in the interval and the integral is estimated from these. The more points are taken the better the estimate should be (we hope). So-called "higher order methods" aim to need fewer sample values to get highly accurate estimates. There is a lot of hype about such methods and which is "best", but the answer is that higher order methods only work better when the function is sufficiently "nice" to start with. For many poorly behaved functions the basic methods are actually best and this is what this page focuses on.

1. Integration by the trapezoidal method

We saw elsewhere an approach to differentiation, as well as use of the <functional> package. In contrast to differentiation, numerical integration is a comparatively straightforward and "easy" process to carry out on a computer, and there are literally hundreds of good methods to choose from.

Perhaps the simplest numerical approach to integration, i.e. to computing $\int_{a}^{b} f (x) d x$ , is to divide the interval $[a, b]$ into $N$ sub-intervals of length $h = (b - a) / N$ , $a_{0} = a, a_{1} = a + h, \dots, a_{N - 1} = b - h, a_{N} = b$ and approximate each integral $\int_{a_{i}}^{a_{i + 1}} f (x) d x$ by $h (f (a_{i}) + f (a_{i + 1})) / 2$ . This gives the formula,

The trapezoidal rule.

Given $a < b$ and $N \in ℕ$ with $h = (b - a) / N$ , we have,

\int_{a}^{b} f (x) d x \approx \frac{h f (a_{0})}{2} + \sum_{i = 1}^{N - 1} h f (a_{i}) + \frac{h f (a_{N})}{2} = \frac{b - a}{N} (\frac{1}{2} f (a_{0}) + \sum_{i = 1}^{N - 1} f (a_{i}) + \frac{1}{2} f (a_{N}))

This formula is in fact rather useful, and despite its simplicity not to be "sniffed at". Its utility stems from the fact that very few assumptions are made about the function $f$ . The function is assumed to be integrable and the values $f (a_{i})$ are assumed to be "typical" values in the region, but the function is not assumed to be "smooth" or approximable by polynomials etc.

Exercise.

You are required to write a program for the trapezoidal rule. Your function is called trapezoidal and the first few lines of code will be as follows.

#include <iostream>
#include <functional>

using namespace std;

double trapezoidal(const function<double(double)>& f, double a, double b, int n_subint) {
  ... 
}

Here n_subint is the number of sub-intervals, $N$ . You can supply your own functions $f$ to test the function. Try also the function $f (x) = \frac{π}{2} \sin (π x)$ and the function $g (x) = \frac{4}{π} \sqrt{1 - x^{2}}$ between 0 and 1. (The theoretical answer in both cases is $1$ .) You will want to #include <cmath> too, see the reference page on doubles.

2. Refinement of the method

If we are to use this method to calculate unknown integrals there are a few things we need to know about it. The main one of course is the accuracy of the answer. We are not doing our job properly if we simply give some value or other for an integral to 16 significant figures unless we say which of those figures we think are correct, and why.

The second problem is related. To use the "trapezoidal" function given earlier we need to give it some value n_subint for the number of sub-intervals to take, and it is not obvious what to take here. Of course, we hope that by choosing n_subint large we will get enough accuracy but how can we be sure? We also hope that we can get away with choosing n_subint small to make the program run as fast as possible. That's a problem: we can't make n_subint both large and small at the same time. So what n_subint do we take?

Recall the previous page on errors: errors can be given as absolute errors or as relative errors. There are a number of sources of errors, of which the main problems here will be truncation errors (not choosing n_subint large enough) and round-off errors (problems that occur in double in intermediate calculations).

Some terminology: suppose $a, b, f$ are chosen and the interval $[a, b]$ is to be split into $N$ sub-intervals of length $h = (b - a) / N$ . Then the method is first order if the error is (approximately) proportional to $N^{-1}$ , second order if the error is (approximately) proportional to $N^{-2}$ , and so on, with the method having order $r$ if the error is (approximately) proportional to $N^{- r}$ ,

Note that the order of the method might depend on $a, b, f$ as well as the algorithm. The order is usually, but need not be, an integer. Also, we cannot expect any such proportionality to work for all such $N$ however large. (You will see this later on.)

The "obvious" approach is to try larger and larger values for n_subint until we think we have enough accuracy. There is a problem with the trapezoidal function, however, which is that repeating it for different n_subint is wasteful because it repeats calculations that have already been done. Fortunately there is a nice trick.

/*
 * Calculates half of the trapezoidal function for a number of
 * sub-intervals n_subint which is 1 or even.  In particular,
 * 
 * If n_subint==1, calculates (b-a)*(f(a)/2 + f(b)/2)
 * 
 * If n_subint>1, calculates h times the sum of f(a_i) for i=1,3,...,n_subint-1
 * where h = (b-a)/n_subint and a_i = a+h*i
 * 
 */
double half_trapezoidal(const function<double(double)>& f, double a, double b, int n_subint) {
  ... 
}

Exercise.

Implement the half_trapezoidal function as indicated. Write a "main" to test it and see that it gives exactly the same as your previous function for n_subint == 1 and (approximately) half the value as your previous function in other cases. (If the answers are not exactly the same, to what do you account the differences?)

Given half_trapezoidal, we can then generate values of the integral similar to the trapezoidal function for values n_subint equal to 1,2,4,8,16, etc., by the following method.

Let n_subint = 1 and calculate $S$ as half_trapezoidal(f,a,b,1)
As many times as required, double n_subint and replace $S$ with $S / 2$ plus the value half_trapezoidal(f,a,b,n_subint).

This gives a sequence of approximations $S_{0}, S_{1}, S_{2}, \dots$ to the integral. We may not know the limit but we can estimate or guess the error in a new approximation $S_{n}$ as being $|S_{n} - S_{n - 1}|$ and the relative error as $\frac{|S_{n} - S_{n - 1}|}{|S_{n - 1}|}$ .

Exercise.

Impement this as a nice usable function, compute_trapezoidal, that finds the value of the integral to a specified relative error, with the following specification.

/*
 * Calculates the integral using the trapezoidal method.  It
 * repeatedly calls half_trapezoidal(f, a, b, n) for n
 * starting at 1 and doubling each time up to at most "max_n" times, 
 * and combines this with the old estimate for the integral to 
 * give a new estimate.  If or when these two estimates differ 
 * by less than the absolute value of the old estimate multiplied by
 * "rel_err" then the new estimate is returned directly.  Otherwise the
 * new estimate is returned after max_n calls to half_trapezoidal.
 * 
 */
double compute_trapezoidal(const function<double(double)>& f, double a, double b, double rel_err=1E-06, int max_n=30) {
  ...
}

Note that 30 "doublings" starting at $1$ will give $2^{30}$ which is at the limit of int and about as big a number of subintervals as could ever be required.

Exercise.

To analyse the course of values in the previous function, implement the following that puts these intermediate values into a vector. (You'll need #include <vector> of course, and you add entries to the vector with push_back(...). Yes, it is intentional to give this function the same name: C++ can distinguish between them using the type of the arguments, a feature known as overloading.)

/*
 * Calculates the integral using the trapezoidal method.  It
 * first clears the vector "estimates" and then fills it
 * with estimates for the integral using the trapezoidal
 * method with 1,2,4,8,..., up to 2^{max_n} sub-intervals
 * placing these in estimates[0],estimates[1],...,estimates[max_n].
 * It does this by repeatedly calling half_trapezoidal(f, a, b, n) 
 * for various values of n and combining this with the
 * previous estimate.
 * 
 */
void compute_trapezoidal(const function<double(double)>& f, double a, double b, vector<double>&estimates, int max_n=30) {
  estimates.clear(); // start by clearing this vector
  ...
}

3. Analysis of the method

Because we have been always doubling the number of sub-intervals to estimate the order $r$ of the method all we have to do is solve $2^{- r} = \frac{e_{1}}{e_{0}}$ where $e_{1}$ is the error of the current integral estimate and $e_{0}$ is the error of the previous integral estimate with half the number of sub-intervals.

Exercise.

Derive this formula for yourself.

The following code uses the second form of the compute_trapezoidal method (putting the data into an array) to print out a report for the various values, errors and estimated order.

Exercise.

Make sure you understand how it works.

void investigate_order(vector<double>&estimates) {
  unsigned n_est = estimates.size();
  double best_est = estimates[n_est - 1];
  double last_err = abs( estimates[0] - best_est );
  int n_subint = 1;
  for (int i=1; i<n_est-1; i++) {
    double err = abs( estimates[i] - best_est );
    double errratio = err / last_err;
    double orderest = -log(errratio) / log(2) ;
    n_subint = 2 * n_subint;
    cout << " n_subint==" << n_subint;
    cout << " estimate==" << estimates[i];
    cout << " error==" << err;
    cout << " ratio==" << errratio;
    cout << " estimated_order==" << orderest;
    cout << endl;    
    last_err = err;
  }
}

int main() {
  cout.precision(10);
  vector<double> estimates;
  cout << "computing trapezoidal method" << endl;
  compute_trapezoidal(f,0,1,estimates);
  investigate_order(estimates);
  return 0;
}

You should be able to use this now to get some idea of the estimated order for different numbers of subintervals. There are various Canvas questions concerning these. Try it out on the two functions given and also on your own.

4. Simpson's method

A certain Mr Simpson has a related method for integration that you may have heard of. In fact Simpson's method can be explained and calculated very quickly from what we have done.

Simpson's method simply says to take the value $(4 S_{n} - S_{n - 1}) / 3$ where $S_{n}$ is the value calculated by the trapezoidal method for $2^{n}$ subintervals and $S_{n - 1}$ is the same for $2^{n - 1}$ subintervals. We can modify the estimates vector easily by this formula to give Simpson's method. Here is the function, and a modified "main".

/* modifies the estimates vector to give Simpson's method */
void compute_simpson(vector<double>&estimates) {
  unsigned n_est = estimates.size();
  for (int i=n_est-1; i>0; i--) {
    estimates[i] = (4.0*estimates[i] - estimates[i-1]) / 3.0;
  }  
}

int main() {
  cout.precision(10);
  vector<double> estimates;
  cout << "computing trapezoidal method" << endl;
  compute_trapezoidal(f,0,1,estimates);
  investigate_order(estimates);
  cout << "computing simpson method" << endl;
  compute_simpson(estimates);
  investigate_order(estimates);
  return 0;
}

Of course, Simpson's method is certainly not the end of the story here. Just as Simpson's method combines two of the trapezoidal estimates, more sophisticated methods combine more than two of the trapezoidal estimates. Something called "Romberg Integration" combines them all, and can give very highly accurate values from a small number of sub-intervals. You may learn about it elsewhere. Other methods (such as Gaussian Quadrature) work by allowing you to choose sub-intervals of different sizes. But always remember that, although these clever methods tend to work well when the function being integrated is particularly smooth or well-behaved, they may not work as well when this is not the case, and then the basic trapezoidal method (in the form compute_trapezoidal here, with double rel_err and int max_n inputs) is usually the best one.