# CS 240: Lecture 4 - Selection Sort

Dear students:

The binary search is a great algorithm. It brings the worst case performance of finding something down to a logarithmic number of operations. However, it expects several preconditions: the collection must support fast access to the middle element of the current search window, and the collection must be sorted. A sorted array meets both conditions. But sorting doesn't come free. Today we examine one possible algorithm for putting an array in order.

## Selection Sort

We're going to visit several sorting algorithms this semester, and there are tens of other ones that we're not going to visit. Speaking from experience, I can say your knowledge of these will degenerate to mush if you don't develop their identities in your brain. Here's the identity of selection sort. It sorts the array like so:

```
find and place the first smallest item
find and place the second smallest item
find and place the third smallest item
...
```

On each iteration, we *select* out the next smallest item. Thus its name.

This code finds the first smallest item:

```
int smallestIndex = 0; // guess smallest is at index 0
for (int i = 1; i < items.length; ++i) {
if (items[i] < items[smallestIndex]) {
smallestIndex = i;
}
}
```

And this code swaps that smallest item into place:

```
int tmp = items[0];
items[0] = items[smallestIndex];
items[smallestIndex] = tmp;
```

After this first item is placed, the search space shrinks. We never again visit index 0. The next step is to fill index 1 with the second smallest item. That code is very similar to the code above. And so are the steps after it. We therefore use a loop to make this code execute the appropriate number of times:

```
public static void selectionSort(int[] items) {
for (int fillIndex = 0; fillIndex < items.length - 1; ++fillIndex) {
int smallestIndex = fillIndex;
for (int i = 1; i < items.length; ++i) {
if (items[i] < items[smallestIndex]) {
smallestIndex = i;
}
}
int tmp = items[fillIndex];
items[fillIndex] = items[smallestIndex];
items[smallestIndex] = tmp;
}
}
```

The hardcoded 0 has been replaced by the outer loop iterator `fillIndex`

.

## Cost Function

If I asked you how fast you can run, you probably wouldn't give me a number. Rather, you'd ask me how far. To get a complete picture of your speed, I might ask how fast you run a mile, a 5K, a 10K, and a marathon. Similarly, when I ask an algorithm how many steps it takes to complete, I don't want a single number. Instead, I want a function. I will tell that function how big the input is, and the function will report how many steps the algorithm takes. We need the function in order to make a sound judgement about the algorithm.

We want to determine the cost function for selection sort. We see that it has two loops. Last time we saw an algorithm with two loops, we said the cost function was \(T(n) = cn^2 + d\). That's because we were considering all pairs of numbers in a list. The pairs can be arranged in a square, so their number is a square. In the case of selection sort, we aren't doing quite as much work.

On the first iteration, we visit all \(n\) cells of the array. On the second, \(n - 1\) cells. On the third, \(n - 2\) cells. We visualize the pattern as a stack of shortening rows of cells. Here's a stack for an array of size 4:

We want to know how many cells there are in a stack for an array of size \(n\). The calculation is a little easier if we double the stack and form a rectangle:

One side of the rectangle has length \(n\) and the other has length \(n + 1\). The total number of cells in the triangle is half the number in the rectangle. This is also the cost function:

Summing up consecutive integers like this happens a lot when we are analyzing algorithms. You should commit this pattern and cost function to memory. Sometimes you will see the sum written as \(\sum^n_{i = 0} i\). These sums are aptly called the triangle numbers.

So, selection sort is a better algorithm than the pairwise difference algorithm we looked at last week, right? Not by the standards of algorithm analysis.

## Simplifying

When we examine an algorithm's cost function, we ignore all terms but the one that grows fastest. For example:

- \(n^2 + n\) simplifies to \(n^2\)
- \(n! + 2^n\) simplifies to \(n!\)
- \(n\log n + n\) simplifies to \(n\log n\)
- \(\sqrt{n} + \sqrt[3]{n}\) simplifies to \(\sqrt{n}\)

Additionally, we ignore all constant terms and coefficients. For example:

- \(\frac{n}{2}\) simplifies to \(n\)
- \(3n^2 - 2n\) simplifies to \(n^2\)
- \(5\log 3n\) simplifies to \(\log n\)

We do not automatically throw out constants involved in exponentiation.

This discarding of information needs some justification. How do we know that throwing out these lower-order terms and coefficients won't throw off our ranking of algorithms?

Let's start with a visceral argument. Suppose we have the simplified functions \(n^2\) and \(n\). When we plot them, we clearly see that an algorithm with cost function \(n\) is faster:

However, what if we hadn't simplified? What if the quadratic function was actually \(\frac{n^2}{100}\)? With that scale factor, the quadratic functions dips below the linear function and is therefore a better algorithm. But if you look at larger values of \(n\), you see that the win is only temporary. In the long run, the linear algorithm wins. Whatever constant coefficient you pick, at some point the linear algorithm will drop below the quadratic. Constant coefficients do not change which function ultimately wins in this case.

If the functions are of the same order, surely the coefficients are important. For example, when choosing between \(50n\) and \(5n\), I will always pick the latter. But if I simplify them both down to \(n\), then I won't know which is better. This is true. The problem is that getting accurate coefficients is really hard. They will be impacted by the programming language, compiler, and hardware.

We must also justify discarding the lower-order terms. Suppose you are comparing the cost functions \(n^2 + 3n\) and \(n^2\). Both of these simplify to \(n^2\), which means that we're saying these two algorithms are equivalent. But that's not what the plot shows:

Clearly the \(n^2\) algorithm is a better one. But when you zoom out, the gap between them becomes insignificant. The gap of \(3n\) will never entirely disappear, but it will be dwarfed by the \(n^2\) term.

## Complexity Classes

Simplifying lands us on just a small set of different functions. Here are some of the most common ones:

- \(1\)
- \(\log n\)
- \(\sqrt{n}\)
- \(n\)
- \(n\log n\)
- \(n^2\)
- \(n^3\)
- \(2^n\)
- \(n!\)

Each of these functions forms a set of all the cost functions that simplify to it. We express that a cost function belongs to one of these sets with Big Theta notation:

- \(5n^2 \in \Theta(n^2)\)
- \(10n + 50 \in \Theta(n)\)
- \(2n^3 + \log n \in \Theta(n^3)\)

We might say that an algorithm is Big Theta of \(n^2\) or it's on the order of \(n^2\).

Each of these functions also forms a set containing all the cost functions that simplify to it or to something faster. This membership is expressed using Big O notation:

- \(5n^2 \in O(n^2)\)
- \(\log n + n \in O(n)\)
- \(n \in O(n^2)\)
- \(n^2 + 3^n \in O(n!)\)

These different sets are called *complexity classes*.

## Big O

Theoretical computer scientists have defined more formal criteria for deciding how to lump these cost functions together in complexity classes. Here's the textbook definition for saying that a function is Big O of another:

For cost function \(T(n)\), \(T(n) \in O(f(n))\) if and only if there exist positive constants \(c\) and \(n_0\) such that

What this means is that function `T`

can be made to be below function `f`

, though to do so might require scaling `f`

by a constant, and we might have to look at large input sizes.

Based on this we can formally show how cost functions fit into certain complexity classes. For example, \(50n \in O(n)\) because \(50n \leq cn\), where \(c = 50\) and \(n_0 = 1\).

This example shows that throwing away the coefficients is an acceptable practice. We can also show that throwing away lower-order terms fits the definition of Big O, but the reasoning requires a little creativity. We may feel, for example, that \(n^2 + 40n \in O(n^2)\). We need to show this:

That seems hard. However, we do know this:

If we combine terms, then we end up finding a value for \(c\) that makes the first inequality true:

This works for \(n_0 = 1\). That means we have \(n^2 + 40n \in O(n^2)\).

You can think of Big O as imposing a less-than-or-equal-to relationship on functions. Function \(f\) is an upper bound to all the functions in its set.

This reveals a weakness in Big O. We can say that \(n \in O(n^2)\) because \(n \leq cn^2\), for \(c = 1\) and \(n > 1\). But this isn't saying much. A very fast algorithm will be Big O of many functions.

## Exercise

Suppose you have developed an algorithm for routing \(n\) pizza deliveries in the most fuel efficient way. The algorithm has cost function \(2^{n+1}\). You are considering selling your algorithm and want to advertise it in the most favorable but truthful way. In which of the following sets do you advertise its membership?

- \(O(2^n)\)
- \(O(2^{n+1})\)
- \(O(3^n)\)

## Big Theta

To have meaningful measures of an algorithm's time efficiency, we want to sandwich its cost function both above and below. Sandwiching is provided by the Big Theta classification, which has this definition:

For cost function \(T(n)\), \(T(n) \in \Theta(f(n))\) if and only if there exist positive constants \(c\), \(d\), and \(n_0\) such that

This means that one scaling of \(f\) can be made up stay on or above function \(T\), and another scaling can be made to stay on or below. You can think of Big Theta as imposing an equality relationship on functions. Given this intuition, which of the following statements is true?

- \(n^2 \in \Theta(n^2)\)
- \(2n^2 \in \Theta(n^2)\)
- \(n^2 \in \Theta(n^4)\)
- \(5n^2 + 2n \in \Theta(4n^3)\)

## TODO

You've got a few things to do before we meet again:

See you next time!

Sincerely,

P.S. It's time for a haiku!