CS 240: Lecture 2 - The Cost of Computing

Dear students:

Last week we took a peek at two features that make a data collection reusable and easy to consume: generics and iteration. Both of these features are a visible part of a collection's interface. This week we take a peek into the implementation details of our collections. Normally we think of implementation as invisible behind-the-scenes details that only the author of a piece of code needs to know about. But maybe we'll find that not all implementation details can be ignored.

LinkedList and ArrayList

Consider this code, which adds half a million numbers to a list and then removes them all:

LinkedList<Integer> numbers = new LinkedList<>();

for (int i = 0; i < 500000; ++i) {
  numbers.add(0, i);
}

while (!numbers.isEmpty()) {
  numbers.remove(0);
}

The code uses LinkedList, with which you may not be familiar. In fact, suppose you found this code on the first day of a new job. You decide to replace LinkedList with ArrayList, because you are more familiar with it. How do you think the ArrayList version will compare to LinkedList?

This semester I'm using a tool called Prompt to ask you questions during lecture. You'll need a computer or mobile device to answer. Visit Prompt and answer the question. For most of these questions, we'll do two rounds. The first round will be you thinking and answering on your own. Then you'll group up with your nearest neighbors and discuss your answers. You will try to persuade your neighbors why your answer is the right one. Then you'll answer the question a second time. This second time your participation will be graded.

To time our two versions, we'll use this Stopwatch utility, which measures the current thread's CPU usage:

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;

public class Stopwatch {
  private ThreadMXBean bean;
  private long startNanos;

  public Stopwatch() {
    bean = ManagementFactory.getThreadMXBean();
  }

  public void start() {
    startNanos = bean.getCurrentThreadCpuTime();
  }

  public double stop() {
    long endNanos = bean.getCurrentThreadCpuTime();
    return (endNanos - startNanos) / 1e9;
  }
}

For informal testing, one could use System.currentTimeMillis instead of ThreadMXBean. I don't use that method because it would also include the time spent in other threads.

On my MacBook Pro, the LinkedList version runs in ~0.11 seconds. The ArrayList version runs in ~23 seconds. This is a drastic difference. Users don't have much patience for a task that takes 23 seconds to complete. Why is there such a big difference?

Contiguous Memory

There are two basic approaches to organizing data in a computer's memory (RAM). One is to allocate a contiguous chunk of bytes. Your program reaches into this chunk using offsets from its start. You probably know this contiguous chunk as an array. Arrays are known for being fast because of a practice called caching. When you read in one element of an array, it moves from RAM into the much faster cache memory located directly on the processor. As you read and write into that element, the CPU accesses cache instead of RAM.

What makes an array fast is that not only does the requested element get pulled into cache, but so do its neighboring bytes. If you are looping through the array, the CPU will often find the element it needs to already be in cache memory. In this case, however, the array backing the ArrayList is not fast. That's because we're shifting elements around. When we insert a number, we insert it at the beginning of the list, which causes all the subsequent elements to change their memory cells. That's a basic problem with arrays: structural changes are expensive.

A related problem is that arrays are fixed in size. You generally can't just extend the end of the array past its current memory cell, because those bytes might be reserved for other data. Instead, you must know how much memory you need for the array at the time you make the array. Or you use an abstraction like ArrayList, which guesses a size for you, and if you fill it up, it automatically creates a bigger array and copies everything over.

Linked Memory

The alternative to contiguous memory is non-contiguous memory. A non-contiguous list stores in RAM two things per element: the datum and the address of the next element. The next element need not be adjacent to its predecessor; it can live anywhere in memory. A collection that is pieced together like this is linked.

There are some advantages to a linked data structure. Imagine inserting a new element at the beginning of the list. You allocate room for the new element, and then point its next link to the node that used to be the head of the list. And that's it. No shuffling of elements is necessary. To remove an element, you just wire up its predecessor to its successor.

In this situation, a linked list is much faster than an array. This is the first example of a common theme in this course: the straightforward solution fails under the stress of large data.

That leads us to another question. Suppose you have a linked list named list. Which of the following calls will be the slowest, if any?

list.get(0)
list.get(size() / 2)
list.get(size() - 1)
list.get(size())

In a straightforward implementation of a linked list, the last valid element will be the slowest to access because it requires a traversal of the links from the very beginning. However, a more sophisticated implementation will also keep a pointer to the end of the list and each node will have a link to its predecessor. Such a list is called a doubly-linked list.

Data Structures

LinkedList and ArrayList are two kinds of lists. One could dream up other kinds, like a list that stores elements in contiguous chunks of 5 before linking to the next chunk. All of these different lists are implementations of an abstract data type (ADT). An ADT is a description of data that describes the possible values of the type and the operations that the type supports. A list is an indexed sequence of data that supports these operations and maybe others:

One of the most important things you will take away from this course is a catalog of ADTs that you will consider when coming up with an algorithm. There are several other ADTs that you should have in your catalog and that we will discuss in this course:

There are many others.

Each of these abstractions allow many possible concrete implementations, or data structures. For example, an ADT doesn't dictate whether arrays or linked structures are used. The data structures themselves make those decisions. This leads to the second important takeaway from this course: how we implement an ADT has a profound impact on its suitability for performing certain tasks. We can't just float around at the abstract level and still write performant software.

Measuring Algorithms

We have an arsenal of tools at our disposal and how each tool has its strengths and weaknesses. To be able to choose which tool to use, we need a way of judging them. Here are a few possible ways in which we can measure algorithms:

A faster algorithm is a better algorithm. Our choice of data structure will influence an algorithm's speed. Much of our discussion together will be devoted to measuring the time efficiency of algorithms and data structures. Let's examine how we measure time efficiency with a case study.

Biggest Difference

Suppose you told by your project manager to write an algorithm to scan a set of numbers and find the biggest, pairwise difference among them. Because your economic model is let-somebody-else-do-it, you turn to StackOverflow and find this solution in Ruby:

#!/usr/bin/env ruby

def biggest_difference(numbers)
  max_diff = 0

  for x in numbers
    for y in numbers
      diff = x - y
      if diff > max_diff
        max_diff = diff
      end
    end
  end

  max_diff
end

How does this look to you? Is it correct? Is it simple? Yes on both accounts. It is very space efficient. Apart from the array, it only allocates a couple of extra variables. Is it time efficient? Let's measure its execution time. We create an array of random numbers and find the biggest difference:

#!/usr/bin/env ruby

require 'benchmark'
require_relative './biggest_difference1.rb'

n = 400

g = Random.new
numbers = Array.new(n) { g.rand(1000000) }
time = Benchmark.measure {
  biggest_difference(numbers)
}
puts time.utime

The execution time seems fine. What if we bump n up? It's hard to get a sense of how the execution time is changing when doing isolated runs. Instead, let's make many runs with varying values of n and plot the execution time as a function of n:

#!/usr/bin/env ruby

require 'benchmark'
require_relative './biggest_difference1.rb'

g = Random.new
ns = 30.times.map { |i| i * 100 }

for n in ns
  numbers = Array.new(n) { g.rand(1000000) }
  time = Benchmark.measure {
    diff = biggest_difference(numbers)
  }
  print "(#{n},%.7f)," % time.utime
end

When we paste the output into Desmos and fit the window to the data, we see an interesting shape. As n increases, the execution time increases much faster. This solution is simply not going to scale to large values of n. Perhaps Ruby is the problem. In languages like Java and C++, the high-level code gets compiled down into the “language” of the processor, into very fast machine instructions that trigger the hardware to take certain actions. In Ruby, the high-level code is interpreted, which means it never gets turned into fast machine code. Instead, it is evaluated by software, which is slower as it involves a lot of interpretation and decision-making at runtime.

Let's compare this solution in Java:

public class BiggestDifference {
  public static void main(String[] args) {
    int[] ns = {
       100,  200,  300,  400,  500,  600,  700,  800,  900, 1000,
      1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000,
      2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000,
    };
    Stopwatch timer = new Stopwatch();

    for (int n : ns) {
      int[] numbers = randomArray(n);
      timer.start();
      biggestDifference(numbers);
      double seconds = timer.stop();
      System.out.printf("(%d,%.6f),", n, seconds);
    }
  }

  private static int[] randomArray(int n) {
    Random g = new Random();
    int[] numbers = new int[n];
    for (int i = 0; i < n; ++i) {
      numbers[i] = g.nextInt(100000);
    }
    return numbers;
  }

  public static int biggestDifference(int[] numbers) {
    int maxDiff = 0;

    for (int x : numbers) {
      for (int y : numbers) {
        int diff = x - y;
        if (diff > maxDiff) {
          maxDiff = diff;
        }
      }
    }

    return maxDiff;
  }
}

The numbers reveal that Java is indeed faster. However, we see the same sort of growth when we plot the execution times. As n increases, the execution time grows faster. If we are dealing with large values of n, even the Java solution will get bogged down. We have reached the point where perhaps we need to rethink the algorithm.

Could we implement it in a way that is more time efficient? Yes. Instead of looking at each pair of numbers, we could just find the minimum and the maximum and compute their difference. That would require only one pass through the array instead of a pass per number. Our revised algorithm looks like this in Ruby:

#!/usr/bin/env ruby

def biggest_difference(numbers)
  if numbers.empty? 
    return 0
  end

  min = numbers[0]
  max = numbers[0]

  for number in numbers
    if number < min
      min = number
    elsif number > max
      max = number
    end
  end

  max - min
end

When we plot the times now, we see that the execution time grows at a constant rate. Though the Java solution would be faster, the Ruby solution is plenty fast.

What to Measure

With both the list experiment and the biggest difference experiment, we used execution time to measure the time efficiency. Execution time turns out not to the best measurement for several reasons:

A better measure is the number of steps. A step is loosely defined as a single operation. An algorithm's number of operations is expressed as a function of n:

$$T(n) = \mathrm{?}$$

Consider this algorithm:

total = 0;
for (int i = 0; i < numbers.length; ++i) {
  total += numbers[i];
}

How many steps are taken? The += happens n times. So does the array indexing. So does ++i. So does the comparison. We could define \(T\) for this algorithm in this way:

$$T(n) = 4n$$

However, this isn't the code that ultimately gets executed. Rather, the code gets compiled down to a bunch of machine instructions. The array index might expand into five machine instructions. The comparison might expand into three. So, \(T\) might more realistically be defined this way:

$$T(n) = 26n$$

While this may be more accurate, the definition of \(T\) now depends on the machine again, which will make comparing algorithms on different machines difficult. For this reason, we define \(T\) more loosely in terms of some constant coefficient:

$$T(n) = cn$$

All algorithms whose number of steps can be expressed this way are said to have linear time.

How should we define \(T\) for this algorithm?

total = 0;
for (int i = 0; i < numbers.length; ++i) {
  for (int j = 0; j < numbers.length; ++j) {
    total += numbers[i];
  }
}

The outer loop executes n times. On each iteration, the inner loop executes n times. That gives us this loose definition of \(T\):

$$T(n) = cn^2$$

All algorithms whose number of steps can be expressed this way are said to have quadratic time. Would you rather have a linear time algorithm or a quadratic time algorithm?

TODO

You've got a few things to do before we meet again:

Finish the lab before 9 AM on Tuesday.
Read about searching.

See you next time!

Sincerely,

P.S. It's time for a haiku!

Never order food The menu is way too big Just say what you want