Abstraction: estimates and approximation

The subject of mathematical estimates links closely with the idea of limits, and the language of estimates raises highly interesting mathematical questions of all sorts, including algorithms and primality.

Gowers uses this topic as a good example of the power of very good (and in many cases very difficult) mathematics. He also uses it very effectively to show to even the most casual observer that there is still a huge amount yet to be discovered in mathematics. Indeed, getting better estimates for a function like the density of the primes is a lifetime's work, and many people have spent their whole life of problems like this. There is still much that is not known.

What you shouldn't go away with, however, is the impression that all difficult mathematics is about estimating functions that have no exact representation. Indeed, most mathematics is quite different and many mathematicians do not have to estimate functions in this way in their research. But the problem with illustrating these other kinds of work in an introductory book or module like this is the amount of background it requires before one can understand what it is about and why it might be useful. (See Gowers' preface on Hilbert Spaces for an inkling into this problem.) So such topics cannot easily be discussed in a book (or a module) like this. That is not to say they don't exist, and indeed the majority of difficult mathematics is (sadly) still beyond us.

Thus the topic of estimates is simply an example chosen to show how difficult mathematical problems can quickly get when you look at them in detail, and how it is possible that there are a great many unsolved problems and much research still to be done.

1. Estimates

In dealing with sequences like $(a_{n})$ it is usually helpful to be able to finitely many cases. Thus for example, the sequence

a_{0} = 2143, a_{1} = 535, a_{2} = 5, a_{3} = 345, a_{4} = 3, a_{5} = 456, a_{6} = 36, a_{7} = 49, a_{8} = 64, \dots

is eventually equal to the sequence $b_{n} = n^{2}$ . When working with sequences, this word "eventually" means "ignoring a finite number of terms". That finite number may be huge, a billion or a trillion or $10^{10^{10^{10^{10}}}}$ but it doesn't matter as far as the meaning of the words go. A lot of mathematics describes "all but a finite number of objects". Since that finite number might be something as large as $10^{10^{10^{10^{10}}}}$ you might wonder what use this kind of mathematics is. (And you might have a point.)

Exercise: explain why, if $(a_{n})$ and $(b_{n})$ are eventually equal up to an additive constant then they are actually equal up to an additive constant, i.e. the word "eventually" can be dropped in this very special case. (See Gowers page 114 for the meaning of this phrase.)

Although estimating sequences (finding similar sequences equal to the original one up to an additive or multiplicative constant) is nice when it can be done, there is still a lot that is interesting to say about when it can't be done, because one sequence grows much faster than another. In fact this idea of "one sequence grows much faster than another" or eventual dominance is another interesting and related topic, one that might prove useful to think about. We first start with a precise definition:

Definition.

We say that a sequence $(a_{n})$ eventually dominates a sequence $(b_{n})$ if $a_{n} > b_{n}$ for all but finitely many $n$ .

This can be used for functions too: a function $f (n)$ eventually dominates $g (n)$ if $f (n) > g (n)$ for all but finitely many $n$ .

You can get a good deal of intuition about functions in mathematics by working out which ones eventually dominates which others.

For example, $n^{3}$ eventually dominates any quadratic polynomial $p (n)$ . Similarly, $n^{4}$ eventually dominates any cubic polynomial. So the degree of the polynomial tells you how fast it grows eventually. (To work out exactly when the dominance takes place you do need to know the contants. So $n^{3}$ eventually dominates $1000000 n^{2} + 1000000000$ , but you won't see this in the first few values, or even in the first thousand values.)

Exponential functions dominate all polynomials. So $2^{n}$ eventually dominates all polynomials irrespective of degree or coefficients. This is one of the reasons why exponentials grow so fast and are so difficult to handle in many practical situations.

From the section on "all you need to know about logarithms, etc.", you know that $2^{n}$ is roughly equal (i.e. up to a multiplicative constant) to any number with $n$ digits. That gives you a really good way of picturing exponential growth.

Exercise.

On page 117 Gowers says, "I have not included $e^{n}$ because if I had then I would have been obliged to change the title of this book." Explain!

2. The prime number theorem

This beautiful section shows some serious mathematics at its very best. There are a huge number of important functions that are very difficult to understand, the density of the primes being just one such example. Instead of trying to find a complicated formula for it, a much better idea is to estimate it approximately. How good your approximate is depends on what you are trying to do, and how good you are (or how difficult is your problem).

(As an aside, I personally do not think that excluding $1$ from the set of primes is just a useful convention. To me, the primes are the positive integers you cannot get from smaller positive integers by multiplication. Can you get $1$ ? Well there are no positive integers smaller. What happens when you multiply no numbers together? Let's back-track slightly. What happens when you add no numbers together? Clearly the answer is zero. Zero is the additive identity, the starting point for addition. So what happens when you multiply no numbers together? Obviously the answer is $1$ , the multiplicative identity. Any other answer would be silly. So $1$ can already be obtained from multiplication and is therefore not a prime.)

Incidently, there is a nice piece of mathematical trivia on the prime number theorem that is worth sharing. The two main functions in this theorem are,

π (x) = the number of prime numbers less than or equal to x

li (x) = \int_{0}^{x} \frac{d t}{ln t}

and the theorem says that these functions are asymptotically equal, i.e. $\frac{π (x)}{li (x)} \to 1$ as $x \to \infty$ .

Most calculations show that $π (x) < li (x)$ at least for values $x$ for which these functions can be calculated. It was a sensible question to ask whether this is always the case. Skewes showed in 1933 that this is not correct. He showed there is a value $x$ for which $π (x) > li (x)$ . Amazingly, he could not give much information on this number, other than it is less than

10^{10^{10^{34}}}

This monstrous number was the largest explicit number that had occurred in any mathematical proof, at least at that point in time. It is completely colossal and now known as Skewes' number.

Incidentally, both records have been beaten since: it is now known that there exists a number $x$ with $π (x) > li (x)$ below $e^{727.951346801}$ . This number is still huge, but considerably smaller. And also, an even more colossal number than Skewe's number, called Graham's number, has been used in a topic called Ramsey Theory. You can look up Skewes' number and Graham's number on wikipedia (which is where I obtained this information in the first place).

3. Estimates in algorithms and complexity

Computer programming problems are a good place where useful estimates are essential. One such problem is for factorising integers. A great many factorisation algorithms are known. In these, the input is a sequence of maybe several hundred digits. The length of time (in computer steps) the algorithm takes varies with the number of digits in the input. More precisely for a given algorithm $A$ we can define, $t_{A} (n)$ to be the greatest number of steps taken by $A$ where the maximum is over all possible inputs with at most $n$ digits.

Even better, we can compare a number of different algorithms and select the one for which the function $t_{A} (n)$ is "best". Does there exist one for which $t_{A} (n)$ does not have exponential growth? No one knows. But, because the answer is supposed (by some people) to be "no", many internet codes have been devised with the rough idea that to decrypt a message it is essentially necessary to factorise a difficult-to-factorise number.

Sorting algorithms can be analysed in a similar sort of way. Gowers gives a nice introduction to sorting algorithms. It is known that the best that can be hoped for in terms of number of comparisons for algorithms that work correctly every time is at most $O (n \log n)$ . (A function $f (n)$ is $O (n \log n)$ if it is eventually dominated by $A n \log n$ for some constant $A$ . So this is like "eventual dominance" and "up to a multiplicative constant" rolled up into one definition.) This $O (n \log n)$ behaviour can be achieved by an algorithm which always works within this amount of time. ("Heapsort" is one such algorithm. The more popular "quicksort" is not particularly well-behaved in this respect as it sometimes runs very slowly.) The easiest and most familiar algorithms such as "bubble sort" and "insertion sort" are $O (n^{2})$ . Bubble sort is the "obvious method" Gowers mentions at the bottom of page 124. In practical terms this is a big deal. This is because $\log n$ grows incredibly slowly, so is to most practical purposes (i.e. for values of $n$ on a human scale) effectively a constant.

The reason why quicksort is so popular for sorting is because it usually runs in time $C n \log n$ with a slightly smaller constant $C$ than the equivalent constant for heapsort. Most implementations of quicksort have special tweaks preventing the bad cases coming up too often. (Personally, I prefer heapsort because it is guaranteed always to work quickly:)

4. Some very fast growing functions

For students who like this sort of thing, and to give an idea of how un-worldly modern mathematics can get, let's look at some really fast growing functions.

A nice game is to give two people pieces of paper the size of a postage stamp and tell them to write the biggest number they can on the paper. The biggest number wins. If you follow the next section you could get very good at this game!

We could start with the exponential function because this is pretty fast growing and very difficult to deal with in many cases. We can consider for example,

f_{0} (x) = 2^{x}

If we want to make it grow faster we can repeat the function. Because ${(2^{2})}^{x} = 2^{2 x}$ and this doesn't grow an awful lot faster, it doesn't make sense to look at this but instead to look at $2^{(2^{x})}$ which can't really be written any other way. (Thus the convention on brackets is that $2^{2^{x}}$ always means $2^{(2^{x})}$ .) For example, ${(2^{2})}^{4} = 4^{4} = 256$ whereas $2^{(2^{4})} = 2^{16} = 65536$ .

Doing this a few times gives some really big numbers, such as

2^{2} = 4

2^{2^{2}} = 2^{4} = 16

2^{2^{2^{2}}} = 65536

2^{2^{2^{2^{2}}}} = 2^{65536}

2^{2^{2^{2^{2^{2}}}}} = 2^{2^{65536}}

The number $2^{65536}$ needs 65536 digits to write in binary, or around 20000 digits to base 10.

So we can define an iterated exponential function,

f_{1} (x) = 2^{2^{\dots^{2}}}

with $x$ 2's.

This grows much faster than any normal exponential. Note you can define this function rather simply, as $f_{1} (x, x)$ , where $f_{1} (x, 1) = x$ and $f_{1} (x, y +1) = f_{0} (f_{1} (x, y))$ .

On a postage stamp, using $f_{1}$ , you could easily write a number much much bigger than the number of particles in the universe.

But then there is nothing to stop you iterating $f_{1}$ . Define, $f_{2} (x, 1) = x$ and $f_{2} (x, y +1) = f_{1} (f_{2} (x, y))$ .

And so on. Using these functions, the numbers you can write on a postage stamp are much more than astonomical.

You might think that was all, but it isn't. There's an easy twist to make something even faster growing. Just say,

f_{ω} (x) = f_{x} (x)

Can you see the point? for every possible $n$ the function $f_{ω}$ grows faster than $f_{n}$ provided its input $x$ is bigger than $n$ . I.e. $f_{ω}$ eventually dominates every $f_{n}$ .

Whenever we have a fast growing function $f$ we can iterate it, and whenever we have a familiy of functions $f_{n}$ we can employ the $ω$ -trick. There is a whole subject in mathematics (related to logic, because this is as much about the way we write or define these functions as anything else) of such families of fast growing functions, and what they can do for us. The numbers involved are all counting numbers but apart from 0,1,2 and a very small number of others, they are all far too big to write them down in the universe we live in.

5. A plan for an essay on estimates

title: What does it mean for one function to eventually dominate another?

Definition. See above.
Examples. Such as the ones above.
Applications. Such as in measuring the performance of algorithms using $O ()$ notation.

Conclusions: this is a very different kind of maths with different set of applications, that often gets so hard that there are almost always unsolved problems left over at the end.