teaching machines

CS 430: Lecture 10 – Concurrency and Exceptions

April 20, 2021 by . Filed under lectures, programming languages, spring-2021.

Dear students,

When you’re converting Celsius to Fahrenheit, a single machine with a single thread of execution can handle the job done. However, many of the computational problems that will earn you a living will require a more sophisticated approach to problem solving. Perhaps you will have to enlist many computers to get the job done. Or perhaps you will need to run a task in the background while the user keeps interacting with your software. Or maybe you will have to serve many users at once. The central theme of large-scale computation is that more than one thing is happening at a time, or at least it must feel that way to the user. Today we look at how our languages can help us deal with concurrent flows of execution.

Concurrency, Parallelism, and Other Terms

You’ve touched upon the ideas of concurrency in previous courses, and I sure wish I knew what you know about it. Some of this might be review, and you might feel patronized. It’s more likely that you’ve forgotten what you learned or you don’t know that you learned it.

Computer scientists talk about two different but related ideas: concurrency and parallelism. Not everyone agrees on their distinction. In general, we can say that we make a program concurrent because there are many things happening at once, and we want our software to be able to handle it all. We make a program run in parallel in order to improve performance. So, concurrency is more about the problem domain, and parallelism is an optimization. The ideas are orthogonal; they can appear together or in isolation.

Each task in a program is handled by a sequential, logical flow of execution. When multiple flows appear to be running at the time, we have concurrency. When we hit the play button on our media player but then continue to navigate through our music library, we have concurrency. The program is not waiting for the music to stop before showing us our Daft Punk collection. When we enter a URL in the web browser, a slow download starts, and we hit the stop button, we have concurrency. The program is not waiting for the download to finish before letting us stop it.

There may be only a single processor with a single core running these tasks. However, the user feels that the program is interactive and responsive because multiple flows take turns getting the hardware. Interleaved execution on non-parallel hardware is called multitasking. Our generation doesn’t really know the world that existed before multitasking, where one clicked a button and waited for the task to finish.

When our computer has multiple cores or processors, we can physically execute flows at the exact same time. If concurrency is not just logical but physical, we have parallelism.

We make our programs execute with concurrency and parallelism for several reasons:

The benefits are not without costs, however. Our software becomes more complex when it executes multiple flows at a time, and we encounter a whole new class of vulnerabilities that are hard to test.

Levels of Concurrency

Concurrent or parallel execution may be applied at different levels of execution:

Operating systems generally provide two vehicles for concurrency: processes and threads. A process has its own memory space with its own code section, globals, stack, and heap. Processes are said to be heavyweight tasks. Threads run within a process. All threads share the memory of their process. Threads are more lightweight compared to processes since they don’t need their own resources.

Where programming languages can help is in statement-level parallelism and unit-level concurrency. Because threading has low overhead, it’s the preferred vehicle of languages for achieving speedups.

As an example of statement-level parallelism through threading, consider this C function to sum an arbitrary number of vectors:

void add_vectors(int nvectors, int length, double *vectors[], double sum[]) {
  for (int i = 0; i < length; ++i) {
    sum[i] = 0.0;
    for (int v = 0; v < nvectors; ++v) {
      sum[i] += vectors[v][i];
    }
  }
}

The OpenMP library provides a set of preprocessor flags that can be use to mark the outer loop to run in parallel:

void add_vectors(int nvectors, int length, double *vectors[], double sum[]) {
  #pragma omp parallel for
  for (int i = 0; i < length; ++i) {
    sum[i] = 0.0;
    for (int v = 0; v < nvectors; ++v) {
      sum[i] += vectors[v][i];
    }
  }
}

When I compile both the serial and parallel versions and time them on my MacBook Pro, I see a speedup of around 4:

$ clang -oserial -O0 sum.c
$ clang -oparallel -Xpreprocessor -fopenmp -lomp -O0 sum.c
$ time ./serial; time ./parallel
./serial    1.48s user 0.65s system   91% cpu 2.333 total
./parallel  4.92s user 1.57s system 1051% cpu 0.617 total

OpenMP isn’t strictly a language construct, but it exploits the preprocessor’s pragma directive, which is designed to steer the compiler’s behavior. In this case, the compiler will automatically insert code to spawn a number of threads and distribute out the work of the loop evenly among them.

Ruby provides unit-level parallelism through its Thread class, which expects a block of code to execute:

threads = (0...4).map do |threadIndex|
  Thread.new do
    puts threadIndex
  end
end

threads.each { |thread| thread.join }

Java provides similar unit-level parallelism, but its Thread class expects an object instead of a block of code:

Thread[] threads = new Thread[4];

for (int i = 0; i < 4; i++) {
  int j = i;
  threads[i] = new Thread(new Runnable() {
    public void run() {
      System.out.println(j); 
    }
  });
  threads[i].start();
}

for (int i = 0; i < 4; ++i) {
  while (true) {
    try {
      threads[i].join();
      break;
    } catch (InterruptedException e) {
    }
  }
}

Lambdas in Java effectively hide the fact that the unit is an object rather than a subprogram:

for (int i = 0; i < 4; i++) {
  int j = i;
  threads[i] = new Thread(() -> System.out.println(j));
  threads[i].start();
}

for (int i = 0; i < 4; ++i) {
  threads[i].join();
}

Under the hood, however, the lambda is still an instance of the Runnable class.

Single-threading

Not all concurrent flows use multiple threads. Co-routines, for example, are two or more flows who are cooperating in some computational task. Since they are not intended to run in parallel, they execute in the same thread.

For instance, suppose I want to iterate through a 3D grid in JavaScript. Writing a triple-nested loop over and over again would clutter up my code, so I turn to a generator, which allows me to write my iteration code once. The generator looks like a function definition, but it’s got a * symbol in front of the function name and a yield statement in the innermost loop:

function *griderator(width, height, depth) {
  for (let z = 0; z < depth; z += 1) {
    for (let y = 0; y < height; y += 1) {
      for (let x = 0; x < width; x += 1) {
        yield [x, y, z];
      }
    }
  }
}

That yield statement effectively pauses the generator and hands control back to this code that’s consuming its values:

for (let p of griderator(4, 3, 5)) {
  console.log(p);
}

Many of our event-driven systems like browser-based JavaScript, Android, and the Unity game engine using a single-threaded execution model to avoid the dangers of threads clobbering shared resources. Such systems typically use an infinite event-handling loop that looks something like this:

while (event = q.dequeue) {
  dispatch event to handler  
}

The runtime adds new events to the queue as the user interacts with the software, and developers can trigger their own events.

Multi-Threading on a Single Thread

Some of our interpreted languages, including Python and some implementations of Ruby, give the appearance of having multiple threads, but the runtime ensures that only a single thread has the CPU at a given time. A global interpreter lock (GIL) is used as a conch. Whichever thread has the GIL has the CPU.

Let’s see what this form of multi-threading means. Consider this Ruby code which sums up vectors in serial fashion:

nvectors = 300
length = 100000

vectors = Array.new(nvectors) {Array.new(length) {0}}
sum = Array.new(length)

for i in 0.step(sum.size - 1, 1)
  sum[i] = 0
  for v in 0...vectors.size
    sum[i] += vectors[v][i]
  end
end

Now we add threads to distribute out the task just as we did with OpenMP:

nthreads = ARGV[0].to_i
nvectors = 300
length = 100000

vectors = Array.new(nvectors) {Array.new(length) {0}}
sum = Array.new(length)

threads = (0...nthreads).map do |ithread|
  Thread.new do
    for i in ithread.step(sum.size - 1, nthreads)
      sum[i] = 0
      for v in 0...vectors.size
        sum[i] += vectors[v][i]
      end
    end
  end
end

threads.each {|thread| thread.join}

When we time these, we find that the threaded version is significantly slower than the serial version:

$ time ./sum_serial.rb; time ./sum_parallel.rb 4
./sum_serial.rb      3.55s user 0.07s system 99% cpu 3.628 total
./sum_parallel.rb 4  4.31s user 0.07s system 99% cpu 4.378 total

The threads do not execute in parallel, and they add overhead. Many systems programmers look down upon scripting languages because they lack “real threads.”

Scheduling

The job of the operating system is to arbitrate between the many concurrent tasks who are all vying for the finite resources of the machine. Memory, disk, and network cards are all shared resources. So is the CPU itself. A scheduler is used to ensure that each task gets its share of the CPU.

The scheduler puts each task in one of these states:

A task goes from ready to running when the scheduler schedules it. After a certain of time elapses, the scheduler pauses it and gives another ready task a chance. A running task might pause itself by requesting input or emitting output. The I/O will run in the background, and the task will be blocked while another task is scheduled. A blocked task will return to the ready queue with the I/O finishes.

How does a scheduler decide what task to pull out of the ready queue? In round-robin scheduling, each task gets a fixed amount of time. Tasks may instead be assigned priorities, with higher-priority tasks being given more turns or more time. In real-time systems, tasks announce their deadlines, and the scheduler does its best to keep every entity satisfied. Life may depend on it.

Dangers

A concurrent program that is progressing on schedule has liveness. But a program’s liveness can be threatened. If two threads are codependent, each waiting on a resource that the other currently holds, the program is in a state of deadlock. Someone will have to give up their resource to restore liveness.

If a concurrent program expects a particular scheduling in order to produce a correct result, we have a race condition. This can happen when access to shared resources is not controlled. Consider this Ruby program which creates two threads, both of which increment a shared i:

#!/usr/bin/env ruby

x = 1
y = 1

a = Thread.new do
  x += y
  # sleep 1
  y += x
end

b = Thread.new do
  sleep 1
  x += y
  y += x
end

a.join
b.join

puts "#{x} #{y}"

If all of a executes before b, then we expect to see this sequence of state changes:

initial   -> x = 1, y = 1
a[x += y] -> x = 2, y = 1
a[y += x] -> x = 2, y = 3
b[x += y] -> x = 5, y = 3
b[y += x] -> x = 5, y = 8

But if a gets paused between its instructions, then we get this sequence of state changes:

initial   -> x = 1, y = 1
a[x += y] -> x = 2, y = 1
b[x += y] -> x = 3, y = 1
b[y += x] -> x = 3, y = 4
a[y += x] -> x = 3, y = 7

We use sleep calls to simulate a thread getting interrupted by the scheduler. If the whimsies of the scheduler might lead to different results, then we’ve got a race condition.

Synchronization

To avoid race conditions, we control access to shared resources through synchronization. In cooperative synchronization, one task depends on resources generated by another task. Together they form a pipeline. The archetypal producer/consumer problem is an example where cooperative synchronization is needed. The producer wants to add new elements to a shared queue. The consumer wants to remove elements from the shared queue. Their access to the shared queue must be controlled so they don’t leave the queue in a broken state. In competitive synchronization, independent tasks contend with each other for resources. Perhaps your search engine is crawling the web. You have a queue of links that must be crawled. Again, access to the queue must be synchronized.

In 1965, Djikstra proposed the semaphore as a mechanism for control access. A semaphore bundles together an integer counter, a task queue, and two operations that your textbook calls wait and release. The semaphore’s counter is seeded with the number of tasks that may access the shared resource concurrently. Before a task access a shared resource, it first calls wait on the semaphore. If the counter is above 0, the counter is decremented and the task proceeds. If the counter is 0, then the task gets thrown into a queue and some other task is run. When a task finishes accessing the shared resource, it calls release. If there’s a task in the queue, it is run. If not, the counter is incremented. When the counter is seeded with 1, we have a mutex.

Semaphores are a low-level synchronization primitive. Djikstra recommended that such synchronization be encapsulated inside a data structure. A data structure with builtin synchronization is called a monitor.

Exceptions

Now we make a sharp pivot to a discussion of dealing with errors in programs. Many different kinds of errors can happen. Perhaps an expected file is missing. Perhaps the caller gave us a null object. Perhaps there’s no more disk space. Perhaps the string that we’re try to convert to an integer only has letters in it. What should we do when we detect an error? A range of choices lies before us:

Many of our modern languages support the final of these strategies. C doesn’t have exceptions. It expects callers to inspect a special global error flag after each function call. Bjarne Stroustrup would have none of this. C++ got exceptions. In C++, exceptions can be of any type, an instance of a class or even an int.

Java has exceptions too. Only subclasses of Exception may be thrown. Language critics complain that Java forces you to acknowledge certain kinds of exceptions, like FileNotFoundException. You must either catch these checked exceptions or declare that your method throws them. A lot of folks like to write code like this:

try {
  return list[0];
} catch (IndexOutOfBoundsException e) {
}

Don’t do this. Exceptions are for exceptional events that you can’t prevent. A bad index is something you can prevent. Use an if statement instead:

if (list.length > 0) {
  return list[0];
}

But up-and-coming Rust uses a special Result type, which may be either Ok with a value or Err with an error object. Go is another language without exceptions. This blurb appears in its FAQ:

We believe that coupling exceptions to a control structure, as in the try-catch-finally idiom, results in convoluted code. It also tends to encourage programmers to label too many ordinary errors, such as failing to open a file, as exceptional.

Go takes a different approach. For plain error handling, Go’s multi-value returns make it easy to report an error without overloading the return value. A canonical error type, coupled with Go’s other features, makes error handling pleasant but quite different from that in other languages.

If we find ourselves in a language without exceptions we might be able to hack something like them together using goto statements. This really only works within the context of a single scope, which is quite a departure from full-blown exceptions. Consider this Java-ish code, which I don’t like:

bool addToFive(ArrayList<String> choices, String item) {
  if (choices.contains(item)) {
    return false;
  }

  try {
    if (choices.length == 5) {
      throw new OverflowException();
    }
  } catch (OverflowException e) {
    System.err.println("Too many");
    System.exit(1);
  }

  choices.add(item);
  return true;
}

We can recast it in C with goto like this:

int add_to_five(list *choices, char *item) {
  if (choices->contains(item)) {
    return 0;
  }

  if (choices->length == 5) {
    goto overflow_handler;
  }

  choices->add(item);
  return 1;

overflow_handler:
  System.err.println("Too many");
  System.exit(1);
}

In general, exceptions are favorable because they are highly structured and some aspects (but not all) of their proper handling can be checked by our tooling.

Conclusion

See you next time!

Sincerely,

P.S. It’s time for a haiku!

Objects, exceptions
What happens when they marry?
Objections are raised