CS 240: Lecture 4 - Selection Sort
Dear students:
The binary search is a great algorithm. It brings the worst case performance of finding something down to a logarithmic number of operations. However, it expects several preconditions: the collection must support fast access to the middle element of the current search window, and the collection must be sorted. A sorted array meets both conditions. But sorting doesn't come free. Today we examine one possible algorithm for putting an array in order.
Selection Sort
We're going to visit several sorting algorithms this semester, and there are tens of other ones that we're not going to visit. Speaking from experience, I can say your knowledge of these will degenerate to mush if you don't develop their identities in your brain. Here's the identity of selection sort. It sorts the array like so:
find and place the first smallest item
find and place the second smallest item
find and place the third smallest item
...
find and place the first smallest item find and place the second smallest item find and place the third smallest item ...
On each iteration, we select out the next smallest item. Thus its name.
This code finds the first smallest item:
int smallestIndex = 0; // guess smallest is at index 0
for (int i = 1; i < items.length; ++i) {
if (items[i] < items[smallestIndex]) {
smallestIndex = i;
}
}
int smallestIndex = 0; // guess smallest is at index 0 for (int i = 1; i < items.length; ++i) { if (items[i] < items[smallestIndex]) { smallestIndex = i; } }
And this code swaps that smallest item into place:
int tmp = items[0];
items[0] = items[smallestIndex];
items[smallestIndex] = tmp;
int tmp = items[0]; items[0] = items[smallestIndex]; items[smallestIndex] = tmp;
After this first item is placed, the search space shrinks. We never again visit index 0. The next step is to fill index 1 with the second smallest item. That code is very similar to the code above. And so are the steps after it. We therefore use a loop to make this code execute the appropriate number of times:
public static void selectionSort(int[] items) {
for (int fillIndex = 0; fillIndex < items.length - 1; ++fillIndex) {
int smallestIndex = fillIndex;
for (int i = 1; i < items.length; ++i) {
if (items[i] < items[smallestIndex]) {
smallestIndex = i;
}
}
int tmp = items[fillIndex];
items[fillIndex] = items[smallestIndex];
items[smallestIndex] = tmp;
}
}
public static void selectionSort(int[] items) { for (int fillIndex = 0; fillIndex < items.length - 1; ++fillIndex) { int smallestIndex = fillIndex; for (int i = 1; i < items.length; ++i) { if (items[i] < items[smallestIndex]) { smallestIndex = i; } } int tmp = items[fillIndex]; items[fillIndex] = items[smallestIndex]; items[smallestIndex] = tmp; } }
The hardcoded 0 has been replaced by the outer loop iterator fillIndex
.
Cost Function
If I asked you how fast you can run, you probably wouldn't give me a number. Rather, you'd ask me how far. To get a complete picture of your speed, I might ask how fast you run a mile, a 5K, a 10K, and a marathon. Similarly, when I ask an algorithm how many steps it takes to complete, I don't want a single number. Instead, I want a function. I will tell that function how big the input is, and the function will report how many steps the algorithm takes. We need the function in order to make a sound judgement about the algorithm.
We want to determine the cost function for selection sort. We see that it has two loops. Last time we saw an algorithm with two loops, we said the cost function was
On the first iteration, we visit all
We want to know how many cells there are in a stack for an array of size
One side of the rectangle has length
Summing up consecutive integers like this happens a lot when we are analyzing algorithms. You should commit this pattern and cost function to memory. Sometimes you will see the sum written as
So, selection sort is a better algorithm than the pairwise difference algorithm we looked at last week, right? Not by the standards of algorithm analysis.
Simplifying
When we examine an algorithm's cost function, we ignore all terms but the one that grows fastest. For example:
-
simplifies to -
simplifies to -
simplifies to -
simplifies to
Additionally, we ignore all constant terms and coefficients. For example:
-
simplifies to -
simplifies to -
simplifies to
We do not automatically throw out constants involved in exponentiation.
This discarding of information needs some justification. How do we know that throwing out these lower-order terms and coefficients won't throw off our ranking of algorithms?
Let's start with a visceral argument. Suppose we have the simplified functions
However, what if we hadn't simplified? What if the quadratic function was actually
If the functions are of the same order, surely the coefficients are important. For example, when choosing between
We must also justify discarding the lower-order terms. Suppose you are comparing the cost functions
Clearly the
Complexity Classes
Simplifying lands us on just a small set of different functions. Here are some of the most common ones:
Each of these functions forms a set of all the cost functions that simplify to it. We express that a cost function belongs to one of these sets with Big Theta notation:
We might say that an algorithm is Big Theta of
Each of these functions also forms a set containing all the cost functions that simplify to it or to something faster. This membership is expressed using Big O notation:
These different sets are called complexity classes.
Big O
Theoretical computer scientists have defined more formal criteria for deciding how to lump these cost functions together in complexity classes. Here's the textbook definition for saying that a function is Big O of another:
For cost function
What this means is that function T
can be made to be below function f
, though to do so might require scaling f
by a constant, and we might have to look at large input sizes.
Based on this we can formally show how cost functions fit into certain complexity classes. For example,
This example shows that throwing away the coefficients is an acceptable practice. We can also show that throwing away lower-order terms fits the definition of Big O, but the reasoning requires a little creativity. We may feel, for example, that
That seems hard. However, we do know this:
If we combine terms, then we end up finding a value for
This works for
You can think of Big O as imposing a less-than-or-equal-to relationship on functions. Function
This reveals a weakness in Big O. We can say that
Exercise
Suppose you have developed an algorithm for routing
Big Theta
To have meaningful measures of an algorithm's time efficiency, we want to sandwich its cost function both above and below. Sandwiching is provided by the Big Theta classification, which has this definition:
For cost function
This means that one scaling of
TODO
You've got a few things to do before we meet again:
See you next time!
Sincerely,
P.S. It's time for a haiku!