# The Penguin Programmer

## Arrays

First, let’s cover what an abstract data structure actually is; a group of single pieces of data linked together in a certain way. The simplest data structure would be just a single variable, such as an `int`

or a `double`

, but we already know a lot about them! The simplest non-trivial structure made from multiple pieces of data is the array. We’ve already come across those before, but they are an example of a simple data structure. If we had an array of the first 10 primes say, and we wanted to get the value of the fifth, then we can do this:

1
2
3

int primes[10] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 };
cout << "The 5th prime is " << primes[4] << endl;

Easy right, but what’s actually happening here? Abstractly we can treat an array as a discrete line of data of fixed length. No extra data can be added, but we can manipulate each piece of data by referencing its distance from the first piece of data in the array, essentially the origin. So `primes[4]`

is another way of saying “The prime that is four primes away from the origin”. Or in other words, the fifth. This kind of abstract model lends itself to a natural representation that the computer can use; give the first piece of data 1 byte of RAM (assume the data is one byte for now), then give the next piece of the data the next byte, the next piece the byte after that, and so on until all of the data has been allocated a place of RAM. We then just need to remember where the first piece of data is, as we can just use the exact same method as before to find the piece of data we want; the fifth piece of data is four pieces of data from the start. If our data was more than a byte long, say four bytes (the usual size of an `int`

), then the fifth piece of data would be `4*4=16`

bytes away from the beginning of the first piece of data. We can do this very easily in C++ using pointer arithmetic. Incrementing a pointer moves it a number of bytes into memory equal to the size of the data type it points to, so for an `int`

we might get

1
2
3
4
5
6

int* baseptr = new int[4];
*baseptr = 2;
*(baseptr+1) = 3;
*(baseptr+2) = 5;
*(baseptr+3) = 7;

Or if you don’t like the idea of using array access notation to define an array, here’s the C equivalent

1
2
3
4
5
6
7

#include <cstdlib>
int *baseptr = malloc(4*sizeof(int)); // In real code, you'd use calloc
*baseptr = 2;
*(baseptr+1) = 3;
*(baseptr+2) = 5;
*(baseptr+3) = 7;

Of course this is extrememly messy, and the less time spent performing memory management the better, so C++ of course has the array access operator </code>[]</code> we use instead. Whilst arrays are simple and efficient, they do have their limitations, namely that data cannot be easily added or removed in the middle or beginning of the array after it has been created. We can add items to the end in a similar way to above, using `new`

(or `realloc`

in C), but insertions inside of the array are not easy. They are possible; for an insertion you can allocate an additional piece of memory on the end of the array, and then move every data piece after the insertion location up one before dropping the new value in place, and for a deletion you can do move everything down and then deallocate the memory at the end, but these are awkward and inefficient (`O(x)`

). There is however an STL container, the `vector`

, to handle all of this memory managment for us. Structures like an array, a `vector`

permits the insertion and deletion of data anywhere within themselves, with `O(x)`

efficiency. So actually the same as if we were doing it manually, but at least we don’t have to write the code ourselves! Data can be inserted onto (or removed from) the end of a vector with `O(1)`

efficieny however. Here’s an example usage of a vector

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

#include <vector>
#include <iostream>
int main(void)
{
std::vector<int> primes; // Data type is placed in angle brackets
primes.push_back(2); // Add a piece of data to the end of the vector
primes.push_back(3);
primes.push_back(5);
primes.push_back(7);
primes.pop_back(); // Remove a piece of data from the end
for(int i = 0; i < primes.size(); ++i)
{
std::cout << primes[i] << std::endl; // Data accessed in the same way as an array
}
return 0;
}

`vector`

s are very useful for creating dynamic arrays, where the size of the array is not known at compile time, say if reading text from a file or storing image data. Saves the hassle of using new, and what’s brilliant about them is they’re classes, so they have destructors; their memory is automatically deallocated when out of the `vector`

s scope. No more memory leaks from forgetting to deallocate memory!

Is there a way to improve the insertion efficiency though, making it constant time (`O(1)`

)? Certainly not without a different way of structuring data, but the answer is certainly a yes!