Introduction to Algorithms Part 1

What is an algorithm

An algorithm is a tool for solving a computational problem. It takes a set of inputs and a produces a set of outputs in a finite amount of time. A set of inputs is an instance of a computational problem.

An example computational problem is:

Given a list of numbers and a number x, find the index of x in the list

And an algorithm to solve it:

Check every number starting from the beginning of the list to see if it matches x. If it does, return the index

A particular implementation of an algorithm is a procedure:

function linearSearch(array, valueToFind) {
  for (var currentIndex = 0; currentIndex < array.length; currentIndex++) {
    if (array[currentIndex] === valueToFind) return currentIndex;
  }
  return null;
}

While the halting problem shows that while there is not a general method to calculate the running time of any algorithm, we can't even know that it ends, we can look at some common cases.

Measuring performance

There are two ways of measuring the performance of an algorithm, empirically and analytically.

Empirical measurement

Empirical measurement is running an implementation of an algorithm and measuring the performance with a timer. While empirical measurement is the ground truth for how well an algorithm runs, there are drawbacks. The time measured not only depends on the algorithm itself but also the hardware, compiler, standard library, etc... this affects comparing benchmarks between different machines. Comparing benchmarks between machines introduces variables into the observations, which may not be what you are trying to measure. It can also be difficult to run the algorithm on realistic data or in the context of a larger application. Measurements are also highly affected by the type of data provided to the algorithm, for example how often the worst case is encountered.

Analytical measurement

An analytical understanding of the running time of our algorithms complements empirical measurement by allowing us to reason about changes to the algorithm and data used for empirical measurement, however it typically provides estimates for how the algorithm performs on large data, and is less accurate on smaller amounts of data unless more complex models are used.

This talk is about analytical measurement.

Worst case assumption

We will also be discussing the running time of algorithms in the worst case. Best case analysis is typically useless, it's only hit under specific circumstances and not often. Average case analysis is useful but more complicated and relies on knowing specific information about the distribution of the data. The worst case gives us a guarantee for the maximum running time of the algorithm.

Models

In order to analytically measure we need a model of computation and theory to tell us how long a piece of code takes. There are many different computational models, some examples:

Finite state machines
Turing machine
Random access machine
Hierarchical memory
Lambda calculus
Petri nets
Actor model

I will use a simplified version of a random access machine for this talk. The machine has

A processor
Some memory The processor provides instructions such as, addition, multiplication, division, memory load, memory store, function call, branching. The memory is random access which means that any address can be accessed in a constant amount of time - we don't have to wait for a tape or disk head.

Counting performance

In this model we assume that all provided elementary operations run in the same amount of constant time, which we give a value of 1. We can now sum the amount of time an algorithm will take

Sequence

For a simple sequence all the operations are run once in order, so we can just sum them

var a = 1 + 1
var b = a * 3 + 4

Here the first line has an assigment and an addition - 2 time units.
The second line has an assigment, multiplication, and an addition - 3 time units.
In total this code block takes 5 time units.

Conditional

For a conditional we take the running time of conditional expression plus the slowest branch - worst case assumption

if (a % 2 === 1 ) {
  return a - 1;
}
else {
  return a;
}

The conditional expression has a modulo and equality for 2 time units.
The first branch has a subtraction and a return - 2 units.
The second branch has a return - 1 unit
In total we have 2 + max(2,1) = 4 units

Function calls

A function call takes a unit for the call, a unit for each argument (assignment), and the time of the body.

function makeEven(a) {
  if (a % 2 === 1 ) {
    return a - 1;
  }
  else {
    return a;
  }
}

Here the function has one argument so we have 2 + the body. In the previous section we decided that the body code takes 5 units - a total of 7.

Loops

For a loop the initialization runs once, while the condition, body, and afterthought run for every iteration of the loop.

const data = [1,2,3,4,5]
for (var currentIndex = 0; currentIndex < 5; currentIndex++) {
  if (array[currentIndex] === 3)
      return currentIndex;
}

In this code

The initialization is an assignment - 1 unit
The body is an equality and a return - 2 units
The condition is an inequality - 1 unit
The afterthought is an increment - 1 unit

The array contains 5 items, so it will loop 5 times. So we have 1 + 5 * (2 + 1 + 1) = 21 units

function linearSearch(array, valueToFind) {
  for (var currentIndex = 0; currentIndex < array.length; currentIndex++) {
    if (array[currentIndex] === valueToFind)
        return currentIndex;
  }
return null;
}

This is the same code, except in this instance we don't know how long the array is. If we let n represent the array length we can say it takes 1 + 4n units.

We now have a way of reasoning about how long our code will take to execute. Next we will examine the assumptions we've made in this model, and them we will look at ranking algorithms based on how they perform on large amounts of data.

Assumptions

We made several assumptions in this model, and we need to decide if we can still use it appropriately with real hardware.

Instruction latency

The most obvious assumption we made is that all operations take the same amount of time. In reality some operations such as division can take 5 - 10 times longer than addition especially on older hardware.

Non-uniform memory access (NUMA)

We also assumed that memory access take the same amount of time. Modern systems use non-uniform memory access which means that memory is not always accessed the same way. Specifically, the processor caches recently accessed memory and accessing memory that isn't cached can take 100 times longer.

Branching

Branching can also vary in the amount of time it takes, the processor remembers which branches were taken last an optimises for the same branch to be taken again - if you have an if statement which inconsistently takes different branches it can behave slower.

Justifying the model

For all of our assumptions we can assume that a time unit is the length of the longest operation. While this does not give us the ability to determine if these factors would make an implementation run faster or slower, this still gives us the ability to compare algorithms. Later, more complex models can be used for more detail which could result in a 100 - 1000x speedup. While this could still save a lot of time, it may not be as significant as the choice of algorithm, which we will see in this section.

This graph shows some hypothetical equations we can get as a result of counting the running time of our algorithms You can see that some functions like n and log(n) grow a lot slower with the size of the data than others such as n^4 or 2^n. You can also see that while n^4 + 10 starts off taking more time, it quickly looks exactly like n^4 as n increases since the n^4 dominates over the 10. This graph is misleading though, since the n^4 graphs look better than 2nx, in fact 2^n is much worse for larger inputs and this is easier to see on a different graph.

this is a log-log graph. Both the axes have been transformed so that the resolution on the graph is inversely proportional to the size of the axis value. You can see this easily as the square from 0 - 10 is the same size as the square from 10^5 - 10^6 even though there are around 10^6 more integers in that square. This graph does better represent how we think of the running time of algorithms though - if it's going to take an hour do you care if it takes another second? For a function that takes 10 seconds another second is 10% of its runtime, while for a function that takes an hour it's around 0.03%.

In this graph you can clearly see that 2^n becomes worse that n^4 at n = 16.

We can also illustrate the difference between the choice in algorithm compared to the speed of the hardware (a constant factor). Let's say we have 10^8 numbers.

If we have a fast computer that can perform 10^10 operations per second using a n^2 sorting algorithm it will take:
10^8 * 10^8 / 10^10 = 10^6 seconds - around 11 hours.
For a computer 1000 times slower (10^7 operations a second) with an n log(n) algorithm we have:
10^8 * log(10^8) / 10^7 = 80 seconds

While running the algorithm on a faster machine is significant, a machine twice as fast as the faster one running the n^2 algorithm would take 5 hours instead of 11, the choice of algorithm is more significant, the machine with a better algorithm saves effectively 11 hours. This is why we assume that all the operations take the same amount of time, and ignore the specific timings for real hardware. Generally you can pick a better algorithm and optimise it for the particular hardware if needed.

Asymptotic analysis

The previous graphs show that we can intuitively see that some algorithms are better than others, but can we make this more concrete. If we divide a smaller number by a larger number it decreases - towards zero, as we keep dividing by even larger numbers we get closer and closer to zero.

This is a graph of x^4 over 2^x. When the result is 1 the two equations give the same result, when it's greater than 1 2^x is faster, and when it is less than one x^4 is faster. You can see as we head to the right of the graph, larger inputs, the graph approaches 0. This is called an asymptote and where the name asymptotic analysis comes from.

We can write this as an equation, and say that f(n) is O(g(n)) for f(n) = x^4 and g(n) = 2^x

However, a function divided by itself or something larger by a constant factor will not approach 0, but stay as a constant. We can adjust our definition to include constants

A more intuitive way of looking at this may be that after the last point that the two functions cross, the larger function stays above the smaller

We can also write this as an equation Where c is any constant, and n0 is the last point the lines cross.

Measurement revisited

Now that we've looked at what we can count we have made some observations about how to use it, in particular that we only care about the dominant term without constant factors. We can revisit how we measure our algorithms and simplify it using this observation.

Sequence

A sequence of basic operations is always a constant amount of time - the number of operations, we can simplify this to just 1 by dropping that constant factor

Conditional

A conditional which only contains simple sequences in the branches (common case) is selecting between constant factors, and also a constant. If it is more complicated than this you may still end up with the max staying in the complexity - but it could also be possible to simplify with some knowledge of the invariants (e.g. how often can each branch be hit)

Function call

A function call was just adding a constant factor to the time the body takes, so we can simplify this to just the time that the body takes.

Loop

A loop takes the mount fo time in the loop body multiplied by the number of iterations. For a loop iterating n times which contains a sequence it's n multiplied by a constant - n

var a = 0;
for (const i=0; i<n; i++) {
  a += 1;
}

If we wrapped this in another loop that takes m iterations it would be m multiplied by its body which we just calculated as n time - m*n time

var a = 0;
for (const j=0; j<m; j++) {
  for (const i=0; i<n; i++) {
    a += 1;
  }
}