# The Penguin Programmer

## Number Systems

Counting is so natural to us that we rarely concern ourselves with the way we do it; we have ten digits, 0 through 9, and we move from digit to digit until we reach 9, at which point we reset the unit (far right) digit back to 0 and add a 1 to the front, the tens digit. Repeating this process we end up with the digits 0 through 9 in the units, tens, hundreds, etc. places. We can therefore express the same number as a multiple of one plus a multiple of ten, plus a multiple of a hundred, and so on. That is,

where `d0, d1, d2, ...`

are the digits of the number in question, and can be either 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. But 1, 10, 100 etc. are just powers of 10 (any number to the power of 0 is defined to be 1), so we can write this as

Of course the appearance of 10 here is purely because we chose to count using ten digits, and this choice is really entirely arbitrary! We could just as well count with five digits, or sixteen, and we’d still be able to express any number we liked (this is called the *basis representation theorem*, and requires mathematical proof). We call this number of digits the *base* of the number system, and say we usually count in base 10, or decimal.

### Octal

Let’s take eight digits then instead of ten, so we have a base 8, or *octal*, number system. Counting up from 0 therefore goes 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, … In this case 12 is not the usual twelve in decimal, but would instead be

so the number 12 interpreted in octal represents the same number as then number 10 does in decimal. As you can imagine, when working with multiple number systems we can’t get away with just stating the digits of a number, we need to say which base we are using. In any calculation or equation like the above, assume that we’re working in the familiar decimal, otherwise we’ll prefix the number with a code to specify which base we’re in, starting with `0`

for octal. If there’s no prefix, assume we’re dealing with decimal numbers. So the above shows that `012 = 10`

.

In C and C++ you can also specify the base of a number in exactly the same way! See if you can predict the output of this snippet of code

As you can see we can freely mix bases in a single expression, and C++ won’t even bat an eye. It can certainly be confusing to the programmer if you do that though!

You might be wondering why we’re doing this though, it certainly seems needlessly complicated! One benefit of octal is that (usually anyway) computers store numbers in groups of bytes, which are eight bits in size, a *bit* being the smallest piece of information a computer can store. If you arranged these bits in a line and wanted the 51st bit, which byte would you need and how many bits into that byte should you go? The answer isn’t obvious, but if you write `51 = 063`

then you can see immediately that you want the third bit in the sixth byte!

### Binary

A significantly more important base is base 2, or binary, which uses just the digits 0 and 1. In fact, bit is short for *binary digit*, and in fact computers represent all numbers in a binary representation. Before we go into why binary is useful, let’s get used to using it. Using the same counting scheme as before (after all, the base doesn’t change the way in which we count, just the way we represent the same quantity) we would obtain 0, 1, 10, 11, 100, 101, 110, 111, 1000, … We will denote binary numbers with the prefix `0b`

. What is the number `0b1100101`

as a decimal number?

On reflection I probably could have chosen a less confusing number, but nonetheless we see that `0b1100101 = 101`

. We’ll see more of binary in the next section

### Hexadecimal

The final base that we will look at is base 16, or hexadecimal. Immediately you might notice a problem; if we only have ten possible digits, how can we use sixteen? For that we just move onto letters, so counting in hexadecimal goes 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, … The common prefix in this case is `0x`

, so `0x12 = 18`

. Luckily this works in C++! What’s more, the letters are case insensitive so `0x3A = 0x3a`

. Personally I prefer to use lowercase letters, but so long as you’re consistent it doesn’t matter.

Mathematical operations become a little harder for us to do in hexadecimal, because there’s more digits to try and keep track of, but they allow larger numbers to be written with fewer digits, which is a small bonus. Additionally they’re rather handy when working with computers for similar reasons to octal; there are eight bits in a byte, which corresponds to a total of `2^8 = 256`

different values that an eight bit binary number can take. It also happens to be equal to `16^2`

, which is the number of a values a two digit hexadecimal number can take! We can therefore use hexadecimal to succinctly describe the data stored in a single byte, and what’s more we can express the data in the next byte by just putting an additional two digits on the end (or the front).

This is most commonly seen when expressing colours, where each pair of digits expresses a different colour component, be it red, green, or blue (or hue, saturation, value etc.) For example, the number `0x0090ff`

has `0x00 = 0`

red, `0x90 = 144`

green, and `0xff = 255`

blue, which looks like this

The colour `0x0090ff`

### Converting from Decimal to Another Base

We’ve seen above how to convert a different base to decimal, but how to we go the other way? Going in this direction isn’t as straightforward, but there is an easy algorithm to do it. Imagine we have a number `x`

, written in decimal, that we want to convert to a base `b`

, having digits `d0, d1, d2, ...`

with `d0`

on the far right. We will find each digit in turn, starting with `d0`

. Recall from above the representation of `x`

in the base `b`

We can see that every term except the `d0`

term is a multiple of `b`

. Because of this, if we divide `x`

by `b`

and take the remainder (that is compute `x % b`

) then the contributions to the remainder from all the terms other than `d0`

will be 0, and hence the remainder will be equal to the first digit `d0`

. We now get rid of the contribution to the number due to `d0`

by subtracting `d0`

from `x`

and then dividing the result—which is divisble by `b`

—by `b`

. Alternatively we can utilise the properties of integer division in C++ (which discards remainders) and simply divide by `b`

. We now have

which is a similar situation to before, and so we repeat the process obtaining `x1, x2, x3, ...`

and terminate when `xi = 0`

for some `i`

. This way we extract all the digits of the base `n`

representation of `x`

. It might be helpful to see this algorithm written in code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

#include <iostream>
#include <string>
// This requires C++11 to run due to the to_string() function.
int main()
{
// Original number
unsigned int x = 141;
// Converted number
std::string result = "";
// Base
unsigned int b = 8;
while(x > 0)
{
// Compute and prepend digit to the left side of the number
result = std::to_string(x % b) + result;
// Calculate the next x
x /= b;
}
std::cout << result << std::endl;
return 0;
}

A very similar method can be used to extract the digits of a decimal number; if `b`

is 10 then `x % b`

will simply return the digit, which we could then add to an array instead of a string.