teaching machines

CS 430: Lecture 1 – Programming Languages

January 20, 2022 by . Filed under lectures, programming languages, spring-2022.

Dear students,

Welcome to CS 430: Programming Languages! What’s this class about? Well, imagine you are a biology student taking a course on mammals and every lecture, lab, and homework is on cows. You wouldn’t really be a biologist at the end of the course. You’d be a cowist.

A cow gives you one picture of how an organism can be. But to be a biologist, you need lots of pictures. Similarly, Java gives you one picture of how a programmer talks to a machine, but to be a computer scientist, you need lots of pictures. Each picture is a window into some deeper concept that lies at the center of all them but is not fully manifested in any single one of them.

Accordingly, this class is about all the tools—the languages—we use to teach machines. We won’t spend much time discussing the applications of these tools. We focus on the tools themselves. That sounds kind of shallow. But when you consider that these tools form the bridge between you and the machine, this class feels much more meaningful and human. We want to design and use tools that take away the pain and time-wasting of software development. In many computer science classes, we focus on making an end-product for an end-user. In this class, we focus on making a mid-product for you, a developer of software. You are just as important as your users, and it’s okay to have a class about you.

Course Structure

The course is broken down into 20 modules, each covering a different programming language topic. These modules come in three different flavors:

Some weeks have two modules, some one. Your final grade is based on the average of all your module grades. Lectures will generally be on Tuesdays, with labs on Thursdays.

There are four sections of this course. I’m teaching two of them, and Dr. Lam is teaching the other two. Dr. Lam and I will follow this tentative schedule.

Programming Languages

Everytime you take a required class, you should wonder why it’s required. The superficial answer for this course is that programming languages is a core component in the ACM/IEEE Computer Science Curriculum. Computer science departments who seek accreditation from ABET must prove that they are meeting the recommendations of this curriculum.

But why is this course in the recommendations? The authors of the curriculum state this about programming languages as a subject:

Programming languages are the medium through which programmers precisely describe concepts, formulate algorithms, and reason about solutions. In the course of a career, a computer scientist will work with many different languages, separately or together. Software developers must understand the programming models underlying different languages and make informed design choices in languages supporting multiple complementary approaches. Computer scientists will often need to learn new languages and programming constructs, and must understand the principles underlying how programming language features are defined, composed, and implemented. The effective use of programming languages, and appreciation of their limitations, also requires a basic knowledge of programming language translation and static program analysis, as well as run-time components such as memory management.

Many computer scientists fear the flux of tools. They resist offering a course about a particular tool, especially if it came out in the last several years. Their belief is that your time at the university is for learning timeless foundational and theoretical ideas, not fleeting industrial practices. They don’t want be seen as training you for a job. So, you don’t get a course on Java and another on JavaScript. Instead, you get a meta course.

This course bundles together the foundational and theoretical ideas of programming languages. We will be examining these ideas primarily through the lens of Ruby, Haskell, and Prolog.

Another reason for this course is to expand the breadth of programming languages that you have seen. There’s a sickness inside of us that worships what we know and demonizes what we don’t. We do this with languages, operating systems, car manufacturers, game consoles, and so on. Our hope is that this course will encourage a more thoughtful and informed critique of languages.

Language Wars

You are probably overwhelmed by the number of tools that you are expected to learn. Wouldn’t it be nice if there was just one language? Let’s talk around that idea for a bit.

Why So Many

You might hear some folks say that a programming language is Turing complete. Usually the language in question is on the fringe of what we think of as a programming language, but in saying that it’s Turing complete, we mean that it’s just as powerful as any other programming language. The concept is named after Alan Turing, who developed a “manual computer”, a model of computation based on a state machine and a scrolling memory tape in the 1930s—before electronic and digital computers were invented. His manual model has been shown to be just as powerful as a modern computer.

A language is Turing complete if one can use it to code up any Turing machine. If you haven’t taken a theory course, this is a vague statement. In practical terms, Turing complete means that a language has a looping or branching mechanism and a means of storing data. Most of the programming languages we use have both of these, which means that they are all of equal power. In other words, if a program could be written in one Turing complete language, it could be written in any other too.

Why then, if all these languages have the same theoretical power, do we have so many of them? You can probably think of some reasons. Here are mine:

There are some interesting parallels between programming languages and our spoken languages. Why don’t we have just one spoken language? Will we some day?

Also, do you know any programming language that isn’t Turing complete? Pure HTML is not, but it can made to be with CSS. Simple SQL is not, but it can be made to be with advanced features. Sometimes we use Turing completeness for gatekeeping. Since HTML is not Turing complete, many say its not a programming language.

A brief aside: don’t gatekeep. When you say HTML is not a programming language, you are implicitly and perhaps intentionally excluding front-end web designers from having technical status. You do this because you are insecure and feel the beastly need to strut.

Who Wins

Since we have so many languages, we need criteria for deciding which one to use for a particular problem. The author of your textbook identifies four broad dimensions for evaluating a programming language:

Let’s talk through what each of these means.

Readability

Early computer scientists were consumed by short-term concerns: will this program fit in memory? will it run on operating system X? will it finish executing on time? Executability was of primary importance. Less attention was placed on long-term concerns like readability.

Why is readability a long-term concern? Because software needs maintenance. Humans, both those that wrote the the software originally and those that didn’t, need to get into the code to fix bugs and add features. For this to be a pleasant and successful process, the software must be read and understood.

What makes a language readable? One way is through simplicity, which means having fewer features. For example, many new programmers look at ++i and i++ and don’t understand the difference. It’s not necessary to have both of these features in a language. Ruby and Python don’t, so they are simpler.

Another way is through orthogonality, which means that power and variety is achieved through a small set of primitives and a small number of ways to combine them. Java, for example, has only eight primitive types, but an infinite variety of classes can be formed out of them. In C, one can form a pointer for any type, effectively doubling the number of types. Then you can take pointers to pointers, and so on.

Another way is through data types. In a language with explicit types, the types serve as a form of documentation. If you see a function void grow(int factor), you have more information about what sort of number factor is than you have with def grow(factor). The first factor may be a percentage since it’s an int, whereas the second could be a percentage or a proportion. In some styles of C, when you see a 0 or 1, you can’t tell if those are numbers are booleans.

Another way is through clear and consistent syntax. In JavaScript, we can declare a variable with the keyword const, which seems like it should make a constant value, right? Not exactly. The value is quite mutable:

const acronyms = {"SMH": "so much hate"};
acronyms['SMH'] = "shaking my head";

The keyword const applies only to the name acronyms, not to the value it refers to.

Writability

We also measure a programming language based on how easily we can write programs in it. Simplicity and orthogonality are important for writability too.

Another way to maximize writability is expressivity, which means that a language has simple ways of expressing complex tasks. In C, including code from another file is not very expressive. One has to create a header file, place a copy of the function signature in the header, and #include the header. In Java, no separate header is needed. In a language like Scratch, programmers can use a repeat loop, which imposes less cognitive load than for and while.

Reliability

Readability and writability is mostly about the humans touching the code. We also care about a language’s reliability, which influences the quality of the software written it.

One way for a programming language to improve reliability is through type checking. In languages with static types, which means the types are known at compile time, many type errors are detected early. In languages without static types, errors are found by running the program, which is not as easy.

Languages with high reliability have explicit support for detecting and recovering from exceptions. In C, there is no language-level support.

Cost

A language’s benefits can be undermined by costs, which can be triggered in a variety of ways. Programmers must be trained in the language. Software must be written in the language, which depends on the writability of the language and the toolchain that accompanies it. The code must be compiled, executed, maintained, and ported. Any process related to a language that costs time also generally costs money.

A company’s technology stack has a cost in that it determines who they can hire. If you get an offer from a company using a language that you think is dying or proprietary, you are unlikely to accept. You’d be limiting your mobility. A language has an opportunity cost for you. If you spend all your time with language X, you will not be able to apply for jobs that expect you to have five years of experience with language Y.

Ruby

We start our discussion of programming languages with Ruby. It falls under these categories:

Let’s work through a few problems to see Ruby in action.

Advent of Code

Advent of Code is a programming challenge that runs every December. Each day, participants are given a two-stage problem to solve. The first stage of the problem from day 1 of 2021 asks you to figure out how many depth readings are increasing. In Java, you’d probably read the file with a Scanner and then write a loop that increments a counter whenever a value is bigger than its predecessor. There’s nothing wrong with such a solution if you like writing code more than solving problems.

The Ruby solution is a lot simpler:

#!/usr/bin/env ruby

lines = File.read('aoc-input.txt').lines
depths = lines.map { |line| line.to_i }
n = depths.each_cons(2).count { |a, b| a < b }
puts n

To be fair, you can write a similar algorithm in Java using streams.

Email List

Arrays in Ruby can often be processed with very minimal code—because of the language’s many builtin mechanisms for processing them. Let’s see these mechanisms at work by processing a comma-separated value file holding a database of people. We want to pull out the email addresses of all the people and form a comma-separated list that we can paste in our mail client.

#!/usr/bin/env ruby

lines = File.read('people.csv').lines.drop(1)

# Split each "a,b,c" into [a, b, c].
students = lines.map do |line|
  line.chomp.split(',')
end

# Winnow down to students in section 1.
section = students.filter { |student| student[3] == '1' }

# Compose address and smash them together.
emails = section.map { |student| student[2] + '@dukes.jmu.edu' }.join(',')

puts emails

TODO

Here’s your TODO list for next time:

Sincerely,

P.S. It’s time for a haiku!

Could there be just one
One master language for code
They’d call it !C