Before we get onto how different data structures work, we need to briefly cover something called Big-Oh notation. Big-Oh notation is a neat way of describing how efficient algorithms are, by considering how algorithms perform for large sets of data. Whilst this is mostly a tutorial on data structures, not algorithms, we’ll still need it to cover the efficieny of the structures. This will not be mathematically rigorous, the aim is to just provide an understanding.

Consider for example a function that performs an action over a set of data, call it f(x), where x is the amount of data the function acts on. Let’s pretend that f(x) goes through a list of x numbers twice, and each time it checks if the number is even. If it is, it divides it by two. If it is odd, it adds one. It then outputs the first number in the sequence. Something like

A pointless function, I’m sure you’ll agree, but it will suit our purposes perfectly. We can see that f(x) needs to perform 2(2x) + 1 operations in total. Two for each x (checking if even and dividing or adding one), twice that for checking the list twice, then one extra for outputing the value. Simplifying we get 4x + 1 operations. We’ve assumed here that every operation takes the same amount of time, which is generally a safe bet for most operations (additions, division and so on), and we’ll ignore the fact that outputting takes an aeon in comparison. Let’s now consider what happens as x gets larger. The 4x term will easily swamp the constant + 1 at the end, so that won’t have much of an effect for large data sets. We can therefore ignore it. The 4 won’t have all that much of an effect (percentage wise) either. If x was 16 then the 4 would provide 25% of the multiplication, but if x was 1000 it would only provide 0.25%. We can therefore ignore it too, giving an approximate number of operations equal to x. We say that f(x) = O(x), or “f of x is Big-Oh of x”.

Now let’s consider a different function, one which calculates a list of sums in a very naive way

First we generate 100 square numbers so the algorithm has something to work with, then we get to the algorithm itself. The function adds up the first n square numbers and puts them in the nth position in the results array. If you’re disgusted by this function it’s a good sign, but let’s analyse it like we did to the first.

To calculate the sum of the first n squares the algorithm performs n additions and 2 assignments. Using the same logic as before, the n will be the important bit and not the 2, so we can ignore it. It does these sums 100 times, or more generally x times, but because n is changing the total number of operations will be 0 + 1 + 2 + 3 + ... + x. This sum is called an arithmetic series, and apologies if you’ve seen it before, if not then consider each part of this sum as a row with n balls on it

Clearly this is forming a triangle of height x and width x, so will have approximately x*x / 2 balls in total. Again the 2 is insignificant for large values of x, so we can ignore it and hence the total number of operations to calculate the sum of the first x square numbers (or any list of numbers for that matter) is approximately x^2. We therefore say that f(x) = O(x^2), or ‘f of x is Big-Oh of x squared’. Hopefully you can see that this algorithm will be slower for large data sets than the first!

If you were disgusted at the above algorithm then that’s good, because it was pretty terrible. After calculating the sum of the first 4 squares, say, we just need to add the fifth square number to calculate the sum of the first 5. The above algorithm can therefore be written to be O(x) instead! This is why we use this notation, to compare the efficieny of algorithms precisely instead of just saying ‘this one is obviously better’ and not having any data to back the statement up.