teaching machines

CS 240: Lecture 19 – Mergesort

October 6, 2021 by . Filed under cs3, fall-2021, lectures.

Dear students:

Insertion sort and selection sort can be written in just a few lines of code, but the energy consumption of each of those lines is quite high. Both are $\Theta(n^2)$. Suppose you have a processor that runs at 3.5 gigahertz. That means its clock ticks many times per second. Let’s generously assume that one operation of the sorting algorithm can execute on each of those clock ticks. Then we have this sorting speed:

$$\frac{3.5 \times 2^{30} \text{ operations}}{1 \text{ second}}$$

Since there are $n^2$ operations for these sorts, we have this total running time:

$$\begin{align}\text{time} &= \frac{n^2 \text{ operations}}{1 \text{ sort}} \times \frac{1 \text{ second}}{3.5 \times 2^{30} \text{ operations}} \\ &= \frac{n^2}{3.5 \times 2^{30}} \text{ seconds per sort}\end{align}$$

When $n = 1000000$, we can expect a selection or insertion sort to take ~266 seconds. That’s not very fast. We need something better. The better algorithm we look at today is mergesort.

Mergesort

Mergesort sorts an array by splitting it in two and sorting each half independently. The recursive calls keep splitting the array. Eventually we reach a subarray that either contains one element or none. The elements in that very small subarray are sorted. Once the left and right subarrays are both sorted, they are merged together into one big sorted array.

The splitting operation might look something like this:

private static void mergesort(int[] items,
                              int[] tmp,
                              int left,
                              int right) {
  if (left > right) {
    return;
  }

  int mid = (left + right) / 2;
  mergesort(items, tmp, left, mid);
  mergesort(items, tmp, mid + 1, right);

  merge(items, tmp, left, mid, right);
}

The tmp variable provides scratch space that eases the merging process. The int parameters delimit the window of the array that is being sorted. We shouldn’t expect the caller to provide anything but the array to be sorted, so we make this helper method private. The public method primes the parameters to their initial values:

public static void mergesort(int[] items) {
  mergesort(items, new int[items.length], 0, items.length - 1);
}

When I look at this algorithm, I find it hard to believe that it actually does anything. Most of its lines are devoted to the splitting. The real sorting magic is done by the merge method.

Suppose merge costs $f(n)$ comparisons to combine the subarrays. What recurrence relationship describes the overall cost of mergesort? We’ll work this as a peer instruction exercise.

Merge

The merge method receives an array that has two sorted subarrays in it. Its job is to reorganize those two subarrays into one sorted subarray. But it has to do this in an efficient way. It does this by considering only the head elements of the two subarrays at a time. It picks off the smaller of the two as the next element in the merged array.

When a head element is pulled from its subarray, the head index is nudged rightward. Eventually one of the subarrays will be consumed. At that point, we stop comparing the head elements and just draw from the rest of the unconsumed list.

We might implement merge in this way:

private static void merge(int[] items, int[] tmp, int left, int mid, int right) {

  // Copy items to temporary array since we're about to
  // chaotically reorganize the original.
  for (int i = left; i <= right; ++i) {
    tmp[i] = items[i];
  }

  int i = left;
  int leftIndex = left;
  int rightIndex = mid + 1;

  // Compare heads until one subarray is empty.
  for (; leftIndex <= mid && rightIndex <= right; ++i) {
    if (tmp[leftIndex] <= tmp[rightIndex]) {
      items[i] = tmp[leftIndex];
      ++leftIndex;
    } else {
      items[i] = tmp[rightIndex];
      ++rightIndex;
    }
  }

  // Copy over remainder of left subarray.
  for (; leftIndex <= mid; ++leftIndex, ++i) {
    items[i] = tmp[leftIndex];
  }

  // Copy over remainder of right subarray.
  for (; rightIndex <= right; ++rightIndex, ++i) {
    items[i] = tmp[rightIndex];
  }
}

Only one of the bottom two for loops will have any iterations. There are other ways to write this method.

In the worst case, how many head comparisons will we need to perform to merge our two subarrays? We’ll work through this as a peer instruction exercise.

Fun Fact

Before we analyze mergesort, we want to encounter a handy identity for summing up powers of two. What’s the sum of 1? What’s the sum of 1 and 2? 1, 2, and 4? This table shows the sums up to the seventh power:

$i$ $2^i$ $\sum_{j = 0}^i 2^j$
$0$ $1$ $1$
$1$ $2$ $3$
$2$ $4$ $7$
$3$ $8$ $15$
$4$ $16$ $31$
$5$ $32$ $63$
$6$ $64$ $127$
$7$ $128$ $255$
$\ldots$ $\ldots$ $\ldots$
$n$ $2^n$ $2^{n+1}-1$

The sum of the sequential powers of two is just one less than the next power of two. This is an identity that will help simplify our recurrences.

Analysis

To prove that mergesort is a faster algorithm, we construct a recurrence relation. Let’s count the number of comparisons between head elements as a representation of the algorithm’s cost. When there is only a single element, there are no comparisons. In the general case, we’ll have to perform comparisons on all but the last head element. Additionally, there are two recursive calls on subarrays of half the size. We therefore have these cases:

$$\begin{align}T(1) &= 0 \\T(n) &= n – 1 + 2 \cdot T(\tfrac{n}{2})\end{align}$$

In order to derive a closed-form solution, we do some expansion:

$$\begin{align}T(n) &= n – 1 + 2 \cdot T(\tfrac{n}{2}) \\ &= n – 1 + 2 \cdot (\tfrac{n}{2} – 1 + 2 \cdot T(\tfrac{n}{4})) \\ &= n – 1 + n – 2 + 4 \cdot T(\tfrac{n}{4})) \\ &= n – 1 + n – 2 + 4 \cdot (\tfrac{n}{4} – 1 + 2 \cdot T(\tfrac{n}{4}))) \\ &= n – 1 + n – 2 + n – 4 + 8 \cdot T(\tfrac{n}{8}) \\ &= (n + n + n) – (1 + 2 + 4) + 8 \cdot T(\tfrac{n}{8}) \\\end{align}$$

After $i$ expansions, we have this expression:

$$T(n) = in – \sum_{j = 0}^{i – 1} 2^j + 2^i \cdot T(\tfrac{n}{2^i})$$

To get rid of the self-reference, we solve for the base case:

$$\begin{align}1 &= \frac{n}{2^i} \\2^i &= n \\\log_2 2^i &= \log_2 n \\i \cdot \log_2 2 &= \log_2 n \\i &= \log_2 n \\\end{align}$$

We’ve seen this result before. Maybe we should accept it as an axiom from here on out. We substitute this value wherever we see $i$:

$$\begin{align}T(n) &= n \log_2 n – \sum_{j = 0}^{\log_2 n – 1} 2^j + 2^n \cdot T(\tfrac{n}{2^{log_2 n}}) \\ &= n \log_2 n – (2^{\log_2 n} – 1) + 2^n \cdot T(\tfrac{n}{n}) \\ &= n \log_2 n – (n – 1) + 2^n \cdot T(1) \\ &= n \log_2 n – n + 1 + 2^n \cdot 0 \\ &= n \log_2 n – n + 1 \\\end{align}$$

TODO

You have some work to do before the next class:

See you next time.

Sincerely,

P.S. It’s time for a haiku!

Got a big problem?
Try splitting it into two
It might not survive