CS 430: Lecture 10 – Concurrency and Exceptions
Dear students,
When you’re converting Celsius to Fahrenheit, a single machine with a single thread of execution can get the job done. However, many of the computational problems that will earn you a living will demand more of you and your computer. Perhaps you will have to enlist many computers to get the job done. Or perhaps you will need to run a task in the background while the user keeps interacting with your software. Or maybe you will have to serve many users at once. Whatever the cause, more than one thing is happening at a time in the software you will write. Today we look at how our languages can help us deal with concurrent flows of execution.
Concurrency, Parallelism, and Other Terms
You’ve touched upon the ideas of concurrency in previous courses, and I sure wish I knew what you learned. Some of this might be review, and you might feel patronized. I’m more inclined to think that you’ve forgotten what you learned or you never learned it in the first place.
Computer scientists talk about two different but related ideas: concurrency and parallelism. Not everyone agrees on their distinction. In general, we can say that we make a program concurrent because there are many things happening at once, and we want our software to be able to handle it all. We make portions of a program run in parallel in order to improve performance. So, concurrency is more about what’s going on outside of our program, and parallelism is about optimizing what’s going on inside our program. The ideas are orthogonal; they can appear together or in isolation.
Each task in a program is handled by a sequential, logical flow of execution. When multiple flows appear to be running at the time, we have concurrency. When we hit the play button on our media player but then continue to navigate through our music library, we have concurrency. The program is not waiting for the music to stop before showing us our Daft Punk collection. When we enter a URL in the web browser that triggers a slow download and we impatiently hit the stop button, we have concurrency. The program is not waiting for the download to finish before letting us stop it.
There may be only a single processor with a single core running these tasks. However, the user feels that the program is interactive and responsive because multiple flows take turns getting the hardware. Interleaved execution on non-parallel hardware is called multitasking. Our generation doesn’t really know the world that existed before multitasking, where one clicked a button and waited for the task to finish.
When our computer has multiple cores or processors, we can physically execute flows at the exact same time. If execution is not just logical but physical, we have parallelism.
We make our programs execute with concurrency and parallelism for several reasons:
- For increased interactivity.
- More efficient use of hardware.
- Because Moore’s law may not hold forever, as transistors reach their physical limits. The smaller the chips, the harder it is to dissipate heat and the harder it is to keep electrons following their intended paths. Improved software performance cannot rely solely on hardware improvements.
The benefits are not without costs, however. Our software becomes more complex when it executes multiple flows at a time, and we encounter a whole new class of vulnerabilities that are hard to test.
Levels of Concurrency
Concurrent or parallel execution may be applied at different levels of execution:
- Instruction-level: machine code instructions are executed simultaneously, as in a pipelined architecture.
- Statement-level: a high-level language might include syntax for scheduling statements to execute in parallel. For example, one might write a parallel-for loop to have each iteration handled by a separate flow of execution.
- Unit-level: a subprogram is called and is run in a separate flow of execution.
- Program-level: each program is run separately in its own process, with scheduling and communication handled by the operating system.
Operating systems generally provide two vehicles for concurrency: processes and threads. A process has its own memory space with its own code section, globals, stack, and heap. Processes are said to be heavyweight tasks. Threads run within a process. All threads share the memory of their process. Threads are more lightweight compared to processes since they don’t need their own resources.
Where programming languages can help is in statement-level parallelism and unit-level concurrency. Because threading has low overhead, it’s the preferred vehicle that languages use to achieve speedups.
As an example of statement-level parallelism through threading, consider this C function to sum an arbitrary number of vectors:
void add_vectors(int nvectors, int length, double *vectors[], double sum[]) {
for (int i = 0; i < length; ++i) {
sum[i] = 0.0;
for (int v = 0; v < nvectors; ++v) {
sum[i] += vectors[v][i];
}
}
}
The OpenMP library provides a set of preprocessor flags that we use to make the outer loop run in parallel:
void add_vectors(int nvectors, int length, double *vectors[], double sum[]) {
#pragma omp parallel for
for (int i = 0; i < length; ++i) {
sum[i] = 0.0;
for (int v = 0; v < nvectors; ++v) {
sum[i] += vectors[v][i];
}
}
}
When I compile both the serial and parallel versions and time them on my MacBook Pro, I see a speedup of around 4:
$ clang -oserial -O0 sum.c $ clang -oparallel -Xpreprocessor -fopenmp -lomp -O0 sum.c $ time ./serial; time ./parallel ./serial 1.48s user 0.65s system 91% cpu 2.333 total ./parallel 4.92s user 1.57s system 1051% cpu 0.617 total
OpenMP isn’t strictly a language construct, but it exploits the preprocessor’s pragma
directive, which is designed to steer the compiler’s behavior. In this case, the compiler will automatically insert code to spawn a number of threads and distribute out the work of the loop evenly among them.
Ruby provides unit-level parallelism through its Thread
class, which expects a block of code to execute:
threads = (0...4).map do |threadIndex|
Thread.new do
puts threadIndex
end
end
threads.each { |thread| thread.join }
Java provides similar unit-level parallelism, but its Thread
class expects an object instead of a block of code:
Thread[] threads = new Thread[4];
for (int i = 0; i < 4; i++) {
int j = i;
threads[i] = new Thread(new Runnable() {
public void run() {
System.out.println(j);
}
});
threads[i].start();
}
for (int i = 0; i < 4; ++i) {
while (true) {
try {
threads[i].join();
break;
} catch (InterruptedException e) {
}
}
}
Lambdas in Java effectively hide the fact that the unit is an object rather than a subprogram:
for (int i = 0; i < 4; i++) {
int j = i;
threads[i] = new Thread(() -> System.out.println(j));
threads[i].start();
}
for (int i = 0; i < 4; ++i) {
threads[i].join();
}
Under the hood, however, the lambda is still an instance of the Runnable
class.
Single-threading
Not all concurrent flows use multiple threads. Co-routines, for example, are two or more flows who are cooperating in some computational task. Since they are not intended to run in parallel, they execute in the same thread.
For instance, suppose I want to iterate through a 3D grid in JavaScript. Writing a triple-nested loop over and over again would clutter up my code, so I turn to a generator, which allows me to write my iteration code once. The generator looks like a function definition, but it’s got a *
symbol in front of the function name and a yield
statement in the innermost loop:
function *griderator(width, height, depth) {
for (let z = 0; z < depth; z += 1) {
for (let y = 0; y < height; y += 1) {
for (let x = 0; x < width; x += 1) {
yield [x, y, z];
}
}
}
}
That yield
statement effectively pauses the generator and hands control back to this code that’s consuming its values:
for (let p of griderator(4, 3, 5)) {
console.log(p);
}
Many of our event-driven systems like browser-based JavaScript, Android, and the Unity game engine using a single-threaded execution model to avoid the dangers of threads clobbering shared resources. Such systems typically use an infinite event-handling loop that looks something like this:
while (event = q.dequeue) { dispatch event to handler }
The runtime adds new events to the queue as the user interacts with the software, and developers can trigger their own events.
Multi-Threading on a Single Thread
Some of our interpreted languages, including Python and some implementations of Ruby, give the appearance of having multiple threads, but the runtime ensures that only a single thread has the CPU at a given time. A global interpreter lock (GIL) is used as a conch. Whichever thread has the GIL has the CPU.
Let’s see what this form of multi-threading means. Consider this Ruby code which sums up vectors in serial fashion:
nvectors = 300
length = 100000
vectors = Array.new(nvectors) {Array.new(length) {0}}
sum = Array.new(length)
for i in 0.step(sum.size - 1, 1)
sum[i] = 0
for v in 0...vectors.size
sum[i] += vectors[v][i]
end
end
Now we add threads to distribute out the task just as we did with OpenMP:
nthreads = ARGV[0].to_i
nvectors = 300
length = 100000
vectors = Array.new(nvectors) {Array.new(length) {0}}
sum = Array.new(length)
threads = (0...nthreads).map do |ithread|
Thread.new do
for i in ithread.step(sum.size - 1, nthreads)
sum[i] = 0
for v in 0...vectors.size
sum[i] += vectors[v][i]
end
end
end
end
threads.each {|thread| thread.join}
When we time these, we find that the threaded version is significantly slower than the serial version:
$ time ./sum_serial.rb; time ./sum_parallel.rb 4 ./sum_serial.rb 3.55s user 0.07s system 99% cpu 3.628 total ./sum_parallel.rb 4 4.31s user 0.07s system 99% cpu 4.378 total
The threads do not execute in parallel, and they add overhead. Many systems programmers look down upon scripting languages because they lack “real threads.”
Scheduling
The job of the operating system is to arbitrate between the many concurrent tasks who are all vying for the finite resources of the machine. Memory, disk, and network cards are all shared resources. So is the CPU itself. A scheduler is used to ensure that each task gets its share of the CPU.
The scheduler puts each task in one of these states:
- new: a task that has been created but not started
- ready: a task that is waiting to be executed
- running: a task that is currently executing
- blocked: a task that is waiting on I/O
- dead: a task that has finished executing
A task goes from ready to running when the scheduler schedules it. After a certain of time elapses, the scheduler pauses it and gives another ready task a chance. A running task might pause itself by requesting input or emitting output. The I/O will run in the background, and the task will be blocked while another task is scheduled. A blocked task will return to the ready queue when the I/O finishes.
How does a scheduler decide what task to pull out of the ready queue? In round-robin scheduling, each task gets a fixed amount of time. Tasks may instead be assigned priorities, with higher-priority tasks being given more turns or more time. In real-time systems, tasks announce their deadlines, and the scheduler does its best to keep every entity satisfied. Life may depend on it.
Dangers
A concurrent program that is progressing on schedule has liveness. But a program’s liveness can be threatened. If two threads are codependent, each waiting on a resource that the other currently holds, the program is in a state of deadlock. Someone will have to give up their resource to restore liveness.
If a concurrent program expects a particular scheduling in order to produce a correct result, we have a race condition. This can happen when access to shared resources is not controlled. Consider this Ruby program which creates two threads, both of which increment a shared i
:
#!/usr/bin/env ruby
x = 1
y = 1
a = Thread.new do
x += y
# sleep 1
y += x
end
b = Thread.new do
sleep 1
x += y
y += x
end
a.join
b.join
puts "#{x} #{y}"
If all of a
executes before b
, then we expect to see this sequence of state changes:
initial -> x = 1, y = 1 a[x += y] -> x = 2, y = 1 a[y += x] -> x = 2, y = 3 b[x += y] -> x = 5, y = 3 b[y += x] -> x = 5, y = 8
But if a
gets paused between its instructions, then we get this sequence of state changes:
initial -> x = 1, y = 1 a[x += y] -> x = 2, y = 1 b[x += y] -> x = 3, y = 1 b[y += x] -> x = 3, y = 4 a[y += x] -> x = 3, y = 7
We use sleep
calls to simulate a thread getting interrupted by the scheduler. If we didn’t expect x
to change in the middle of the execution of a
, then this code has a race condition.
Synchronization
To avoid race conditions, we control access to shared resources through synchronization. In cooperative synchronization, one task depends on resources generated by another task. Together they form a pipeline. The archetypal producer/consumer problem is an example where cooperative synchronization is needed. The producer wants to add new elements to a shared queue. The consumer wants to remove elements from the shared queue. Their access to the shared queue must be controlled so they don’t leave the queue in a broken state. In competitive synchronization, independent tasks contend with each other for resources. Perhaps your search engine is crawling the web. You have a queue of links that must be crawled, and threads will draw from the queue. Access to the queue must be synchronized.
In 1965, Djikstra proposed the semaphore as a mechanism for control access. A semaphore bundles together an integer counter, a task queue, and two operations that your textbook calls wait and release. The semaphore’s counter is seeded with the number of tasks that may access the shared resource concurrently. Before a task may access a shared resource, it first calls wait on the semaphore. If the counter is above 0, the counter is decremented and the task proceeds. If the counter is 0, then the task gets thrown into a queue and some other task is run. When a task finishes accessing the shared resource, it calls release. If there’s a task in the queue, it is run. If not, the counter is incremented. When the counter is seeded with 1, we have a mutex. Only one entity can access the shared resource. The mutex makes it mutually exclusive.
Semaphores are a low-level synchronization primitive. Djikstra recommended that such synchronization be encapsulated inside a data structure. A data structure with builtin synchronization is called a monitor. Some of the collections in Java are monitors. You can tag a method as synchronized
if you want only one entity to access the collection at a time.
Exceptions
Now we make a sharp pivot to a discussion of dealing with errors in programs. Many different kinds of errors can happen. Perhaps an expected file is missing. Perhaps the caller gave us a null object. Perhaps there’s no more disk space. Perhaps the string that we’re try to convert to an integer only has letters in it. What should we do when we detect an error? A range of choices lies before us:
- We could do nothing and silently continue. This is a terrible, inhumane strategy.
- We could print an error message. This is slightly better, but output only helps the user and only if the user has a way of seeing the output. The rest of the code knows nothing about the error.
- We could terminate the program. This is pretty severe. Some errors are to be expected and can be recovered from.
- We could expect the detector of the error to handle the error, but this requires considerable duplication of effort.
- We could return a special error flag. This isn’t bad, but it assumes there’s a special value that can be returned.
indexOf
often returns -1 when the target object cannot be found in a list, but what doesmax
return when you give it the empty list? If you say 0, then how do you distinguish that frommax([0, 0, 0])
? - We could pass an error handler to each subprogram, but the caller cannot be expected to know what errors might be encountered by subprogram code.
- We could pass an extra out parameter to hold the error flag, as in
max(list, &status)
. This is getting better, but the burden is on the caller to check the status and propagate the status to its caller. - We could have each function return a value of a special type that can be in one of two states:
Ok
andError
. Haskell has optional typesMaybe
andEither
that support this kind of error propagation. - We could stop the flow of execution and send a special object called an exception to the caller describing the exceptional event. The error-detecting code throws or raises an exception. After an exception is raised, it goes through a binding phase. The surrounding code may have opted-in to handling the exception by wrapping the dangerous code in a try/catch block. If so, the exception is bound in the catch statement. If the immediate scope does not attempt to catch it, the active subprogram is terminated, its stack frame is wiped, and the exception propagates up to the next caller. If no handler is found, the program terminates. If a handler is found, then we proceed to continuation. Where do we pick up execution? If the handler itself threw an exception, then that exception goes through the same propagation process. Otherwise we pick up execution on the first line of code after the handler. In all cases, we can wrap dangerous code up in a try statement with a finally block that will be executed before continuation or propagation proceeds. We use finally to deallocate heap blocks, close files, and other cleanup tasks.
Many of our modern languages support the final of these strategies. C doesn’t have exceptions. It expects callers to inspect a special global error flag after each function call. Bjarne Stroustrup would have none of this. C++ got exceptions. In C++, exceptions can be of any type, an instance of a class or even an int
.
Java has exceptions too. Only subclasses of Exception
may be thrown. Language critics complain that Java forces you to acknowledge certain kinds of exceptions, like FileNotFoundException
. You must either catch these checked exceptions or declare that your method throws them. A lot of folks like to write code like this:
try {
return list[0];
} catch (IndexOutOfBoundsException e) {
}
Don’t do this. Exceptions are for exceptional events that you can’t prevent. A bad index is something you can prevent. Use an if
statement instead:
if (list.length > 0) {
return list[0];
}
But up-and-coming Rust uses a special Result
type, which may be either Ok
with a value or Err
with an error object. Go is another language without exceptions. This blurb appears in its FAQ:
We believe that coupling exceptions to a control structure, as in the try-catch-finally idiom, results in convoluted code. It also tends to encourage programmers to label too many ordinary errors, such as failing to open a file, as exceptional.Go takes a different approach. For plain error handling, Go’s multi-value returns make it easy to report an error without overloading the return value. A canonical error type, coupled with Go’s other features, makes error handling pleasant but quite different from that in other languages.
If we find ourselves in a language without exceptions we might be able to hack something like them together using goto
statements. This really only works within the context of a single scope, which is quite a departure from full-blown exceptions. Consider this Java-ish code, which I don’t like:
bool addButCapAtFive(ArrayList<String> choices, String item) {
if (choices.contains(item)) {
return false;
}
try {
if (choices.length == 5) {
throw new OverflowException();
}
} catch (OverflowException e) {
System.err.println("Too many");
System.exit(1);
}
choices.add(item);
return true;
}
We can recast it in C with goto
like this:
int add_but_cap_at_five(list *choices, char *item) {
if (choices->contains(item)) {
return 0;
}
if (choices->length == 5) {
goto overflow_handler;
}
choices->add(item);
return 1;
overflow_handler:
System.err.println("Too many");
System.exit(1);
}
In general, exceptions are favorable because they are highly structured and some aspects (but not all) of their proper handling can be checked by our tooling.
Conclusion
See you next time!
P.S. It’s time for a haiku!
John beat that steam drill
But his heart stopped soon after
A race condition