Binary notation

This is a technical and essential mathematical page, explaining in detail how integers and rationals can be written in binary. Not all computer programming languages store numbers in the way described here, but most do. You will have to learn about the advantages and disadvantages.

1. Integers in binary

Computers typically store numbers in binary, also called base 2. This is because it is easy to design electronic circuits that are "on" (1) or "off" (0). A sequence of binary digits defines a natural number, just as a sequence of decimal digits does.

The idea is to assign values to the positions of the digits, the value $1$ or $2^{0}$ to the units position, the value $2$ or $2^{1}$ to the twos position (analogous to "tens"), the value $4$ or $2^{2}$ to the fours position (analogous to "hundreds"), and so on.

Example.

The binary number $10011$ is $1 \times 2^{4} + 0 \times 2^{3} + 0 \times 2^{2} + 1 \times 2^{1} + 1 \times 2^{0}$ which is written in decimal as 19.

Nonnegative integers are usually stored like this on a computer.

There are two main conventions. The first is that the computer language only allows a fixed number of digits. C++ works like this. In fact the C++ standard doesn't say how many digits are available, but you can reasonably expect at least 31 binary digits. (The phrase "binary digit" is usually contracted to "bit".) Using 31 binary digits the biggest number that can be represented is $2^{0} + 2^{1} + \dots + 2^{30} = 2147483647$ . (The other convention, used in some languages, allows a variable amount of memory that grows as needed to be available. Such numbers are bounded by the amount of memory available to the computer, but this representation is noticeably much slower because of the overheads of allocating and using extra memory. It is available in C++ in an add-on package if needed.)

One way of representing negative numbers is to represent them as a sign digit (usually 0 for "+" and 1 for "-") together with a nonnegative number represented by a sequence of binary digits. This has a number of disadvantages, one being that you have two notations $+0$ and $-0$ for what is presumably the same number.

C++, like most computer systems, uses a very clever system call two's complement that is similar to this but rather faster and easier to calculate with when you understand it well.

Supposing there are 32 binary digits allocated for integer variables (as is common on many implementations of C++). Then the most significant digit does not represent $2^{31}$ as it would for nonnegative integers, but instead represents $- 2^{31}$ .

It is easier to see what is going on when we replace the number of digits, 32, by something smaller. Suppose we only allow 4 binary digits. Then the first eight numbers are

0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111

for 0 to 7 inclusive. The next number, $1000$ represents ${-2}^{3}$ so is $-8$ . The one after that is $1001$ or ${-2}^{3} + 2^{0} = -7$ , and so on. So,

1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111

are, in order, $-8,- 7,- 6,- 5,- 4,- 3,- 2,- 1$ .

This is how it's done in C++. It explains why (with 32 binary digits) C++ allows integers from $- 2^{31} = -2147483648$ to $2^{31} -1 = 2147483647$ . It also explains some curiosities such as why adding 1 to $2147483647$ gives $-2147483648$ . The main advantage of 2's complement is that it is faster and the same algorithms work for positive and negative numbers, a fact that you can take on trust or experiment with on your own.

2. Non-integral numbers

Before we start this section please discard the familiar and unfortunate terminology common in schools that "a decimal number is a number that is not an integer". This is clearly unhelpful. A "decimal number" (if the phrase means anything) is a number written in decimal, i.e. in base 10. Such numbers can certainly be integers. As you will see, a number that is not an integer can also be written in binary. But then of course it is not a decimal!

The "decimal point" is used to indicate the position of the units column where the digit has value multiplied by $10^{0}$ or $1$ . There is no reason why we can't have a "binary point" in binary to do the same thing. For example, a binary number like $101.101$ represents $2^{2} + 2^{0} + 2^{-1} + 2^{-3}$ which might be otherwise written as $5 \frac{5}{8}$ .

Binary numbers for familiar fractions can be calculated by doing long division. For example, to find the binary for $1 / 10$ one divides $1$ by $1010$ (the binary for 10) in binary. This sounds difficult but is in general much easier than it would be in decimal because the full multiplication table in binary is easy. It is just

0 \times 0 = 1 \times 0 = 0 \times 1 = 0

1 \times 1 = 1

Doing the long division gives the binary for $1 / 10$ as

0.0001100110011001100 \dots

Notice that $1 / 10$ is a recurring binary number. This means that for the usual system of binary representation, the computer will not be able to represent $1 / 10$ exactly. This frequently causes problems.

Exercise.

Show that the rational numbers that can be represented exactly in binary with a finite sequence of digits are precisely those of the form $\frac{x}{2^{y}}$ for integers $x, y$ .

Exercise.

Show that (like for base 10) the rational numbers are the numbers that can be written in binary with a repeating section of some fixed length.

We will come to how numbers like this are actually stored on the computer in C++ later.