CS 240: Lecture 7 - Quick Sort

Dear students:

We've been on a tour of sorting algorithms, of which there are many. But our purpose here is not to enumerate all possible algorithms. Our bigger goal is for you to gain the skills needed to judge algorithms and choose one that suits the conditions of your task. Our tour ends today with quick sort, an algorithm that beats out the others. Maybe.

Quick Sort

Merge sort has some nice worst case running time, but have you checked out quick sort? It too takes a recursive, divide-and-conquer approach to sorting. Here's the overall algorithm:

public static void quickSort(int[] items) {
  quickSort(items, 0, items.length - 1);
}

private static void quickSort(int[] items,
                              int left,
                              int right) {
  if (left < right) {
    int lastSmallIndex = partition(items, left, right);
    quickSort(items, left, lastSmallIndex);
    quickSort(items, lastSmallIndex + 1, right);
  }
}

What's different about quick sort compared to merge sort? Merge sort required scratch space that was at least as big as the original array. We've been focusing on time complexity, but space complexity can also be a concern. Quick sort doesn't require that extra space. Quick sort is an in-place sorting algorithm, which means it doesn't need an extra chunk of memory to do its work. Merge sort is not in-place unless you want to sacrifice its running time or readability. Is selection sort? Is insertion sort?

The recursive pattern is also a bit different. With merge sort, the structure is sort-left, sort-right, and merge. With quick sort, the structure is partition, sort-left, and sort-right. As with merge sort, the helper method is where the magic of sorting really takes place. We will walk through this partition method:

private static int partition(int[] items,
                             int left, 
                             int right) {
  int pivotIndex = (left + right) / 2;
  int pivot = items[pivotIndex];

  // Advance from both ends until window collapses.
  while (left < right) {
    // Skip rightward past < pivots.
    while (items[left] < pivot) {
      left++;
    }

    // Skip leftward past > pivots.
    while (items[right] > pivot) {
      right--;
    }

    // Swap out-of-place pair.
    if (left < right) {
      int tmp = items[left];
      items[left] = items[right];
      items[right] = tmp;
      left++;
      right--;
    }
  }

  return right;
}

Analysis

When we looked at merge sort, we expressed its cost function as a recurrence. We never got that recurrence to a form that we could compare with other cost functions. Let's do this for quick sort. If we count comparing items to the pivot as the basic operation, its cost function is similar to that of merge sort:

$$\begin{align} T(1) &= 0 \\ T(n) &= n + 2 \times T(\frac{n}{2}) \\ \end{align}$$

In order to figure out what complexity class this cost function belongs to, we need to massage out the recursive term. This is our recipe for turning a recurrence into a closed-form solution that has no recursive term:

Expand the general case of the cost function until you see a pattern emerge. It should be clear after three or so rounds.
Express the pattern of the expanded cost function in terms of $i$ expansions.
Find for what value of $i$ the expansion reaches the base case by solving the recursive term parameter for the base case size.
Plug that value of $i$ back into the pattern and eliminate the recursive term.

In order to derive a closed-form solution for quick sort's recurrence, we perform these three expansions of the cost function:

$$\begin{align} T(n) &= n + 2 \times T(\tfrac{n}{2}) \\ &= n + 2 \times (\tfrac{n}{2} + 2 \times T(\tfrac{n}{4})) \\ &= 2n + 4 \times T(\tfrac{n}{4})) \\ &= 2n + 4 \times (\tfrac{n}{4} + 2 \times T(\tfrac{n}{8}))) \\ &= 3n + 8 \times T(\tfrac{n}{8}) \\ \end{align}$$

After $i$ levels of this expansion, we have this general form:

$$T(n) = in + 2^i \times T(\tfrac{n}{2^i})$$

We want to get rid of the self-reference, which we can do if we get the parameter of the self-referential term down to the base case, where it's no longer self-referential. The base case happens when the parameter is 1, so we set up this equivalence and solve for $i$:

$$\begin{align} 1 &= \frac{n}{2^i} \\ 2^i &= n \\ i &= \log_2 n \\ \end{align}$$

Then we plug this value of $i$ back in to the general form of the expansion to get the closed form of the cost function:

$$\begin{align} T(n) &= n \log_2 n + 2^{\log_2 n} \times T(\tfrac{n}{2^{log_2 n}}) \\ &= n \log_2 n + n \times T(\tfrac{n}{n}) \\ &= n \log_2 n + n \times T(1) \\ &= n \log_2 n + n \times 0 \\ &= n \log_2 n \\ \end{align}$$

This is pretty good compared to the $n^2$ complexity of selection sort and insertion sort. There's just one problem: the partition algorithm might pick a really bad pivot. A really bad pivot would be smaller than all the over values or bigger. That pivot wouldn't split the array in two. It would only shrink it by one element. In the worst case, when bad pivots are selected, quick sort in $\Theta(n^2)$. But it's rare to encounter this worse case, and we generally report its average case of $\Theta(n \log_2 n)$.

TODO

You've got a few things to do before we meet again:

Keep working on the first programming assignment. Work steadily or regret it. Commit and push after every work session.

See you next time!

Sincerely,

P.S. It's time for a haiku!

Quicksort is sicksort It's slicksort at its shticksort It's what I'll picksort