CS 430: Lecture 13 - Concurrency and Parallelism

Dear students:

We close out our semester with some exercises that demonstrate how Rust empowers programmers to write code that executes in parallel.

Fast Mean

Suppose you want to quickly calculate the sum of a data set. First you need a data set. A random number generator can help. This code preallocates a vector to have a capacity of N elements and then fills with uniformly distributed random numbers:

Rust

let mut generator = thread_rng();
let mut xs = Vec::<f64>::with_capacity(N);
for _ in 0..N {
    xs.push(generator.gen());
}

The expression xs.iter().sum() adds them all up in serial fashion. You want to time this iteration. The Instant struct measures elapsed time. You put all this together in main:

Rust

use rand::{thread_rng, Rng};
use std::time::Instant;

const N: usize = 10_000_000;

fn main() {
    let mut generator = thread_rng();
    let mut xs = Vec::<f64>::with_capacity(N);
    for _ in 0..N {
        xs.push(generator.gen());
    }

    let start = Instant::now();
    let sum: f64 = xs.iter().sum();
    let elapsed = start.elapsed();
    println!("sum: {:?}", sum);
    println!("elapsed: {:?}", elapsed);
}

Is it fast enough? Maybe. Can it be made faster? Almost certainly. This is a situation where a data parallel solution with its implicit threads would be nice. Rayon provides a parallel iterator that requires only minor changes to your original serial code:

Rust

use rand::{thread_rng,Rng};
use std::time::Instant;
use rayon::prelude::*;

const N: usize = 10_000_000;

fn main() {
    let mut generator = thread_rng();
    let mut xs = Vec::<f64>::with_capacity(N);
    for _ in 0..N {
        xs.push(generator.gen());
    }

    let start = Instant::now();
    let sum: f64 = xs.par_iter().sum();
    let elapsed = start.elapsed();
    println!("sum: {:?}", sum);
    println!("elapsed: {:?}", elapsed);
}

The speedup is pretty good. Even better, it required almost no effort.

Fast XOR Texture

Now you wish to generate an XOR texture. In an XOR texture, each pixel's color is computed by xor'ing its integer coordinates.

Rust

use image::{ImageBuffer, RgbImage, Rgb};
use std::time::Instant;

fn main() {
    let (width, height) = (4096, 4096);
    let mut image: RgbImage = ImageBuffer::new(width, height);
    let max = (width.max(height) - 1) as f64;

    let start = Instant::now();
    for r in 0..height {
        for c in 0..width {
            let xor = ((r ^ c) as f64 / max * 255.0) as u8;
            let rgb = Rgb([xor, xor, xor]);
            image.put_pixel(c, r, rgb);
        }
    }
    let elapsed = start.elapsed();
    println!("elapsed: {:?}", elapsed);

    image.save("xor.png").unwrap();
}

Could this generation be sped up? Maybe. Let's create a pool of threads and submit each row to the pool's job queue. When a pixel has been computed, we could lock the image with the mutex and write into it. But it would be less work to let main retain exclusive ownership of the image and instead use a channel so that the threads can post the pixel colors back to main, as we do here:

Rust

use image::{ImageBuffer, RgbImage, Rgb};
use std::time::Instant;
use threadpool::ThreadPool;
use std::sync::mpsc::channel;

fn main() {
    let (width, height) = (4096, 4096);
    let mut image: RgbImage = ImageBuffer::new(width, height);
    let max = (width.max(height) - 1) as f64;

    let start = Instant::now();
    let (sender, receiver) = channel();

    let pool = ThreadPool::new(4);
    for r in 0..height {
        let sender = sender.clone();
        pool.execute(move || {
            for c in 0..width {
                let xor = ((r ^ c) as f64 / max * 255.0) as u8;
                let rgb = Rgb([xor, xor, xor]);
                sender.send((c, r, rgb)).unwrap();
            }
        });
    }

    for _ in 0..(width * height) {
        let (c, r, rgb) = receiver.recv().unwrap();
        image.put_pixel(c, r, rgb);
    }
    let elapsed = start.elapsed();
    println!("elapsed: {:?}", elapsed);

    image.save("xor.png").unwrap();
}

Adding message passing and a thread pool is a significant slowdown in this case. Why? The overhead of communication apparently exceeds the cost of each computing a color. Would a shared and mutable image be faster? Maybe. It would still require a mutex. There's no particular reason that the whole image should be locked since pixels are independent, but I don't know how to have unlocked shared mutable data in Rust. Maybe next semester.

Fast Blur

When I was preparing for our discussion today, I was really disappointed that the parallel XOR texture generation was really slow. Probably it's good to see that not every trick is advantageous in all situations. But still. So I looked for an image-processing operation that does benefit from parallelism. Let's try blurring an image.

The per-pixel operation is more expensive that computing an xor of its row and column indices. Instead, we sum up the pixel intensities of all the neighbors. Their average becomes the center pixel's new color. Here's a function that computes the average of a neighborhood:

Rust

fn blur_hood(image: &RgbImage, column: u32, row: u32) -> Rgb<u8> {
    let (width, height) = image.dimensions();

    let r_start = row as i32 - RADIUS as i32;
    let r_end = row as i32 + RADIUS as i32;
    let c_start = column as i32 - RADIUS as i32;
    let c_end = column as i32 + RADIUS as i32;

    let mut sum: [u32; 3] = [0, 0, 0];
    for r in r_start..=r_end {
        for c in c_start..=c_end {
            let r = r.clamp(0, height as i32 - 1) as u32;
            let c = c.clamp(0, width as i32 - 1) as u32;
            let rgb = image.get_pixel(c, r);
            sum[0] += rgb[0] as u32;
            sum[1] += rgb[1] as u32;
            sum[2] += rgb[2] as u32;
        }
    }

    let pixel_count = ((2 * RADIUS + 1) * (2 * RADIUS + 1)) as f64;

    Rgb([
        (sum[0] as f64 / pixel_count) as u8,
        (sum[1] as f64 / pixel_count) as u8,
        (sum[2] as f64 / pixel_count) as u8,
    ])
}

The code is made noisy by lots of type casts and clamping the coordinates so they don't go off the edge of the image. This main function opens up an image and generates a new image in which each pixel is the blur of its counterpart in the original image:

Rust

use image::{RgbImage, Rgb, ImageBuffer};
use std::time::Instant;

const RADIUS: u32 = 10;

fn main() {
    let image = image::open("/Users/twodee/Desktop/plasma2.png").unwrap().to_rgb8();
    let (width, height) = image.dimensions();
    let mut blurred: RgbImage = ImageBuffer::new(width, height);

    let start = Instant::now();
    for r in 0..height {
        for c in 0..width {
            let rgb = blur_hood(&image, c, r);
            blurred.put_pixel(c, r, rgb);
        }
    }
    let elapsed = start.elapsed();
    println!("elapsed: {:?}", elapsed);

    blurred.save("blurred.png").unwrap();
}

That's the serial version. Could it be sped up? Maybe. Let's try the same tactic we used for xor: a thread pool and a message channel:

Rust

use image::{RgbImage, Rgb, ImageBuffer};
use threadpool::ThreadPool;
use std::sync::mpsc::channel;
use std::time::Instant;
use std::sync::Arc;

const RADIUS: u32 = 10;

fn main() {
    let image = image::open("/Users/twodee/Desktop/plasma2.png").unwrap().to_rgb8();
    let (width, height) = image.dimensions();
    let mut blurred: RgbImage = ImageBuffer::new(width, height);

    let start = Instant::now();
    let image = Arc::new(image);
    let (sender, receiver) = channel();

    let pool = ThreadPool::new(4);
    for r in 0..height {
        let sender = sender.clone();
        let image = Arc::clone(&image);
        pool.execute(move || {
            for c in 0..width {
                let rgb = blur_hood(&image, c, r);
                sender.send((c, r, rgb)).unwrap();
            }
        });
    }

    for _ in 0..(width * height) {
        let (c, r, rgb) = receiver.recv().unwrap();
        blurred.put_pixel(c, r, rgb);
    }
    let elapsed = start.elapsed();
    println!("elapsed: {:?}", elapsed);

    blurred.save("blurred.png").unwrap();
}

To make the original image available to all the threads, it needs to be shared immutably. We therefore employ an Arc.

The good news is that we do indeed see a speedup. A significant one!

Conclusion

Let's revisit what we've done this semester. We first examined how languages are recognized by a computer with lexers and parsers. Then through the lens of Ruby we examined the ideas of naming things, how the values behind the names might change, and how those names are visible within a scope. Next we opened our discussion on how values have types that determine their operations. Knowing the types early means our tools can detect some errors and optimize code. But knowing them only at runtime means our code is really flexible. We looked at functions pull repeated code out into reusable parts. And we learned how many programs can be written as a series of function calls to map, filter, and fold. We paired those functions with state to produce objects.

Through Haskell, we introduced several new ways of making functions, through partial application, lambdas, and composition. We learned how to navigate a world in which data is immutable. And we explored a new take on types.

Through Rust, we saw a language that is trying to make software safe and fast by bringing together static types, immutability, and a garbage collection system that the compiler can figure out.

We talked about a lot of stuff. Our goal has never been for you to memorize anything or learn the minutiae of ten different programming languages. Rather, my feelings about why this class is important can be summarized in two big ideas.

First, we often refer to the stuff we developers use as tools. Part of me really likes this word. My father was a mechanic, and he had an addiction to tools. They gave him leverage and torque, they made it possible to get work done in very narrow spaces, and they wiped clean at the end of job, almost always outliving one job to serve another tomorrow. But I also don't like this word to describing programming languages.

Is a spider's web a tool? When I think of tools, I think of something external to the tool user. But a spider's web comes from its abdomen. It is something internal made external. In this sense, it is much more than a tool. I feel the same way about language in general. Our brains have this uncanny facility for recognizing and generating language. Grammars and encodings seem to naturally flow out of our brains to help us communicate. A programming language is not some external force acting upon us. It comes from us, it is us, and it shapes what we can do. It is my hope that the tools that you encounter in your lives embed themselves inside you and give the ideas that are inside of you a voice. If the right programming language doesn't exist, you have what it takes to make your own. I hope you can teach those ideas to a machine, and share them with others.

Second, I want to discuss a little bit of psychology. I think what happens to our brains as we grow is described well by psychologist William Perry. He studied college students like you in the 1950s and 1960s and found distinct patterns to their cognitive development. Let's summarize the main stages:

Dualism: I believe actions and beliefs are either right or wrong, and these labels are designated by an Authority that I can can trust. I shut my eyes and ears to whatever and whomever is wrong. Wrong things should be burned, and wrong people deserve punishment. Example: My iPhone is a masterpiece of design and engineering, delivered to Earth by the almighty Apple. Android only exists because Google found enough monkeys that something had to emerge.
Multiplicity: I am an Authority, and so are you. Right and wrong are a matter of opinion, and if you listen to others you're a sheep. I'm going to do what I think is right, and you can do your thing. Anyone that says otherwise is an unenlightened dualist. Example: A phone is a phone. I had to steal this iPhone, but they were overpriced and I had just spent the last of my paycheck on Steam's Winter Sale.
Relativism: I am not the Authority I thought I was, and you aren't either. It turns out that some Authorities are charlatans and hypocrites. Some are right about area X but have it all wrong in area Y. Example: iPhones are great for people who want stuff that works out of the box. But man, Android phones are great for somebody who wants to tinker.
Commitment: Based on what I've seen of the world, I trust much of what Authorities A and G have to say, but not B and D. But I will keep evaluating new Authorities that come along, and carefully weigh what they have to say. Example: I like my iPhone. Those new devices from Samsung have some awesome features, though. I hope Apple copies them soon and there's no lawsuit.

In the world of technology, there is absolutely no reason to monogamously marry a programming language or a platform. There's no need to choose. Your employer can get you an iPhone and an Android phone. You can use Windows, Mac, and Linux. You can and should know many languages, and continue learning new ones, because the world of technology is a world where isRight is neither true nor false. It is null, waiting to be defined. Your world will shift underneath you, but you can save yourself some trouble by shifting it yourself. You must be prepared to update yourself, and that can't happen if Java or C or Ruby or JavaScript are the one true way. You don't have to love all your choices, but by knowing them, you can achieve what many critics do not: informed dissent.

In which of Perry's stages of cognitive development are you? The early stages have us believing that the world is full of accurately labeled things, each marked good or bad. The middle stages have us believing that those labels were placed at random, and we reject the very idea of good and bad. The final stages have us accepting that our world is full of mixed bags, that some things are better than others, and that both good and bad things can be made better. This situation is extremely relevant to our discipline, in which people are eager to sell their soul to a particular technology.

Let us close with this Japanese legend of two swordsmiths that I think captures our situation with programming languages:

A legend tells of a test where Muramasa challenged his master, Masamune, to see who could make a finer sword. They both worked tirelessly and eventually, when both swords were finished, they decided to test the results. The contest was for each to suspend the blades in a small creek with the cutting edge facing the current. Muramasa's sword, the Juuchi Yosamu ("10,000 Cold Nights") cut everything that passed its way; fish, leaves floating down the river, the very air which blew on it. Highly impressed with his pupil's work, Masamune lowered his sword, the Yawarakai-Te ("Tender Hands"), into the current and waited patiently. Only leaves were cut. However, the fish swam right up to it, and the air hissed as it gently blew by the blade. After a while, Muramasa began to scoff at his master for his apparent lack of skill in the making of his sword. Smiling to himself, Masamune pulled up his sword, dried it, and sheathed it. All the while, Muramasa was heckling him for his sword's inability to cut anything. A monk, who had been watching the whole ordeal, walked over and bowed low to the two sword masters. He then began to explain what he had seen.

"The first of the swords was by all accounts a fine sword, however it is a blood thirsty, evil blade, as it does not discriminate as to who or what it will cut. It may just as well be cutting down butterflies as severing heads. The second was by far the finer of the two, as it does not needlessly cut that which is innocent and undeserving."

Some our languages are dangerous, while others are safe. Some are concerned with performance; others with developer time. Some achieve compactness of expression; others readability. Some unify features of other languages; others are decidedly different. Which sword do we pick up? Thus we come to the final haiku of the semester:

Which sword is better? The unchecked one or the checked? Well, I've got two hands

TODO

Here's your list of things to do in the near future:

Continue working on project 3 in Rust. Post a progress report by Friday noon.

We'll have one last short lab on Thursday. My aim is to make it “finishable” during the class period.

See you next time.

Sincerely,