# CS 240: Lecture 5 - Insertion Sort

Dear students:

Last time we looked at selection sort. We implemented it and we analyzed it. Today we do the same for insertion sort. But we also need to formalize our simplification of cost functions down to complexity classes.

## Insertion Sort

Just like selection sort, insertion sort orders a collection. But its mechanic is different. Selection sort focuses on a single cell and finds the value that belongs in it. As soon as a cell is filled, that cell is never touched again. Insertion sort takes this approach:

```
shove element 1 leftward to order elements 0-1
shove element 2 leftward to order elements 0-2
shove element 3 leftward to order elements 0-3
...
```

After we place an element, it may not be in the right spot. Later elements may displace it back to the right. That seems wasteful, right? Surely insertion sort has no redeeming qualities over selection sort.

Here's one way it might be implemented:

```
public static void insertionSort(int[] items) {
for (int fromIndex = 1; fromIndex < items.length; ++fromIndex) {
for (int toIndex = fromIndex; toIndex > 0 && items[toIndex - 1] > items[toIndex]; --toIndex) {
int tmp = items[toIndex - 1];
items[toIndex - 1] = items[toIndex];
items[toIndex] = tmp;
}
}
}
```

To make an informed statement about the superiority of selection sort over insertion sort, we must examine insertion sort's cost function. Suppose we try sorting this array:

The first iteration will touch all five cells and yield this ordering:

The second iteration will touch only four cells and yield this ordering:

And so on for the remaining iterations:

This stack of cells looks just like selection sort. That means we have this cost function:

This simplifies to \(~n^2~\). By this reasoning, insertion sort is a quadratic algorithm.

It would seem that insertion sort is no better than selection sort. However, what if we sort this array?

On the first iteration, we touch only elements 0 and 1:

On the remaining iterations, we touch only the `fromIndex`

and its immediate left-neighbor:

Under these circumstances, the cost function is:

By this reasoning, insertion sort is a linear algorithm.

So, which is it? Is insertion sort linear or quadratic? It is both. In the worst case, it runs in quadratic time. In the best case, it runs in linear time. To paint a complete picture of the algorithm, you must present its best, worst, and average cases. Computing average case performance is a lot of work, and we're not going to do it this semester.

If you want to mislead people into buying your algorithm, you share your best case but hide the fact that the best case is rare. If you want to mislead people into rejecting your opponent's algorithm, you share their worst case but hide the fact that the worst case is rare.

In this case, the best case of insertion sort is rare, so you are unlikely to experience that linear time performance. Selection sort, on the other hand, always touches the same number of cells. Its best, worst, and therefore average cases are all quadratic.

## Complexity Classes

Simplifying lands us on just a small set of different functions. Here are some of the most common ones:

- \(~1~\)
- \(~\log n~\)
- \(~\sqrt{n}~\)
- \(~n~\)
- \(~n\log n~\)
- \(~n^2~\)
- \(~n^3~\)
- \(~2^n~\)
- \(~n!~\)

Each of these functions forms a set of all the cost functions that simplify to it. We express that a cost function belongs to one of these sets with Big Theta notation:

- \(~5n^2 \in \Theta(n^2)~\)
- \(~10n + 50 \in \Theta(n)~\)
- \(~2n^3 + \log n \in \Theta(n^3)~\)

We might say that an algorithm is Big Theta of \(~n^2~\) or it's on the order of \(~n^2~\).

Each of these functions also forms a set containing all the cost functions that simplify to it or to something faster. This membership is expressed using Big O notation:

- \(~5n^2 \in O(n^2)~\)
- \(~\log n + n \in O(n)~\)
- \(~n \in O(n^2)~\)
- \(~n^2 + 3^n \in O(n!)~\)

These different sets are called *complexity classes*.

## Big O

Theoretical computer scientists have defined more formal criteria for deciding how to lump these cost functions together in complexity classes. Here's the textbook definition for saying that a function is Big O of another:

For cost function \(~T(n)~\), \(~T(n) \in O(f(n))~\) if and only if there exist positive constants \(~c~\) and \(~n_0~\) such that

What this means is that function \(~T~\) can be made to be below function \(~f~\), though to do so might require scaling \(~f~\) by a constant, and we might have to look at large input sizes. If you even a new algorithm and want to say it's better than what's already out there, you have to prove its membership in a complexity class. You could do this by finding \(~c~\) and \(~n_0~\) that make the inequality true. Sometimes this is straightforward. For example, \(~50n \in O(n)~\) because \(~50n \leq cn~\), where \(~c = 50~\) and \(~n_0 = 1~\). But other times it's not. We're going to look at a different way to prove membership.

You can think of Big O as imposing a less-than-or-equal-to relationship on functions. Function \(~f~\) is an upper bound to all the functions in its set.

This reveals a weakness in Big O. We can say that \(~n \in O(n^2)~\) because \(~n \leq cn^2~\), for \(~c = 1~\) and \(~n > 1~\). But this isn't saying much. A very fast algorithm will be Big O of many functions.

## Exercise

Suppose you have developed an algorithm for routing \(~n~\) pizza deliveries in the most fuel efficient way. The algorithm has cost function \(~2^{n+1}~\). You are considering selling your algorithm and want to advertise it in the most favorable but truthful way. In which of the following sets do you advertise its membership?

- \(~O(2^n)~\)
- \(~O(2^{n+1})~\)
- \(~O(3^n)~\)

## Big Theta

To have meaningful measures of an algorithm's time efficiency, we want to sandwich its cost function both above and below. Sandwiching is provided by the Big Theta classification, which has this definition:

For cost function \(~T(n)~\), \(~T(n) \in \Theta(f(n))~\) if and only if there exist positive constants \(~c~\), \(~d~\), and \(~n_0~\) such that

This means that one scaling of \(~f~\) can be made up stay on or above function \(~T~\), and another scaling can be made to stay on or below. You can think of Big Theta as imposing an equality relationship on functions. Given this intuition, which of the following statements is true?

- \(~n^2 \in \Theta(n^2)~\)
- \(~2n^2 \in \Theta(n^2)~\)
- \(~n^2 \in \Theta(n^4)~\)
- \(~5n^2 + 2n \in \Theta(4n^3)~\)

## Proving Membership

Finding constants to prove that a cost function is \(~O~\) or \(~\Theta~\) is sometimes tedious work. On such occasions, you might prefer to calculate this ratio:

If the limit converges, then the value \(~c~\) tells us what set the growth function belongs to:

We're not going to prove this. Sorry.

Let's apply this strategy in a few examples. Consider \(~T(n) = n^3 + 2n~\) and \(~f(n) = n^3~\). We solve the limit:

Since \(~c = 1~\), we have \(~\Theta~\) membership.

Consider \(~T(n) = \frac{n}{2}~\) and \(~f(n) = n^2~\). We solve the limit:

Since \(~c = 0~\), we have \(~O~\) membership, but not the others.

Consider \(~T(n) = 2^n~\) and \(~f(n) = 3^n~\). We solve the limit:

Since \(~c = 0~\), we have \(~O~\) membership, but not the others.

We'll work through \(~T(n) = 6n + \sqrt{n}~\) and \(~f(n) = 2n~\) as a peer instruction exercise.

## TODO

You've got a few things to do before we meet again:

See you next time!

Sincerely,

P.S. It's time for a haiku!

`while`

Maybe even two